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A  Unified  Approach  to  Estimating  Tail  Behavior.  (May  1989) 

Scott  D  Grimshaw,  B.S.,  Southern  Utah  State  College; 

M.S.,  Texas  A&M  University 
Chair  of  Advisory  Committee:  Dr.  Emanuel  Parzen 

Tail  estimators  are  proposed  which  make  minimal  assumptions  and  let  the 
data  dictate  the  form  of  the  probability  model.  These  estimators  use  only  the 
observations  in  the  tail  and  are  based  on  a  unifying  density-quantile  model.  The 
fundamental  result  in  this  work  is  a  representation  of  the  quantile  function  of  the 
exceedences  over  a  threshold.  This  representation  (1)  motivates  a  unified  param¬ 
eterization  for  tail  estimators  of  the  underlying  probability  model;  (2)  motivates 
methods  for  obtaining  parameter  estimates;  and  (3)  simplifies  the  derivation  of 
the  asymptotic  properties  of  the  proposed  parameter  estimates. 

Parameter  estimates  may  be  obtained  using  a  Generalized  Pareto  Distri¬ 
bution  or  a  Generalized  Extreme  Value  Distribution  model  of  the  exceedences. 
Assuming  the  underlying  distribution  can  be  correctly  classified  as  either  short 
tailed  or  long  tailed,  other  estimates  are  formed.  The  asymptotic  properties  of 
these  estimates  are  derived  under  rate  of  convergence  conditions  to  show  the 
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effect  of  threshold  selection  on  parameter  properties. 

The  parameters  are  shown  to  be  nonidentifiable  and  their  estimators  contain 
a  bias  which  may  approach  zero  very  slowly.  Therefore,  if  the  parameters  are  the 
focus  of  the  analysis,  extremely  large  sample  sizes  are  required  to  reduce  the  bias 
to  a  negligible  amount.  If  the  tail  estimates  are  of  interest,  the  bias  is  less  likely 
to  be  serious  and  the  nonidentifiability  problem  provides  a  closer  approximation 
to  the  tail  for  small  samples. 
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1.  INTRODUCTION 

Suppose  that  the  possible  observed  values  from  a  population  can  be  char¬ 
acterized  by  a  random  variable  X  whose  probability  model  is  estimated  using  a 
sample  from  the  population.  The  properties  of  this  estimated  probability  model 
which  correspond  to  the  population  characteristics  of  interest  are  the  foundation 
of  statistical  analysis. 

Three  important  functions  of  a  probability  model  for  a  continuous  random 
variable  are  the  absolutely  continuous  distribution  function  F(z),  the  quantile 
function  Q{u),  and  the  density  function  /(z).  The  significance  of  these  three 
functions  in  statistical  analysis  follows  from  their  interpretation  as  key  properties 
of  the  population. 

For  example,  the  distribution  function  F{x)  is  the  probability  that  an  ob¬ 
served  value  from  the  population  will  be  less  than  or  equal  to  a  given  value  of  z, 
i.t.  F(z)  =  P[X  <  x).  In  applications  where  the  observed  values  are  times  until 
failure  or  death,  the  distribution  function  for  a  given  value  of  z  is  the  probability 
that  the  lifetime  will  be  less  than  or  equal  to  z.  A  more  optimistic  expression  of 
this  information  is  the  survival  function  used  in  reliability.  The  probability  that 
the  lifetime  will  exceed  a  given  value  of  x  is  S(x)  =  P[X  >  z]  =  I  —  F{x). 

The  quantile  function  Q(u)  is  the  smallest  value  of  z  such  that  the  probability 
of  a  value  greater  than  or  equal  to  z  is  equal  to  u,  i.t.  Q(u)  =  F~1(u)  =  inf{z  : 
F{x)  =  u}.  In  applications,  the  quantile  function  is  used  to  determine  the  value 
of  z  such  that  an  observed  value  of  this  magnitude  (or  greater)  occurs  with 
probability  u  for  a  given  value  of  u. 

The  density  function  f(x)  of  an  absolutely  continuous  distribution  function 
represents  the  probability  X  is  in  the  interval  (a,  6)  for  a  <  6  as  the  area  under 
the  density  function  between  a  and  6,  i.t.  P[o  <  X  <  b\  =  f^f(x)dx.  The 
density  function  is  used  to  describe  many  properties  of  the  population  graphically. 
Characteristics  such  as  modality  and  skewness  are  evident  from  plots  of  /(z). 

In  some  applications,  the  population  characteristics  of  primary  interest  cor- 

The  format  and  style  follows  that  of  Tht  Annals  of  Statistics. 
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respond  to  the  tails  of  the  distribution  function,  quantile  function,  and  density 
function.  For  example,  an  experimenter  investigating  the  lifetime  of  a  product 
will  want  the  probability  of  an  early  death  or  an  exceptionally  long  life.  That 
is,  the  value  of  the  distribution  function  F(x)  for  values  of  x  with  F(x)  near 
zero  or  values  of  x  with  F(x)  near  one.  A  hydrologist  analyzing  an  annual  flood 
record  will  want  the  magnitude  of  rare  high  level  floods.  That  is,  the  value  of  the 
quantile  function  Q(u)  for  values  of  u  near  one.  An  experimenter  may  investigate 
the  tails  of  the  density  function  /(x)  to  graphically  display  the  concentration  of 
possible  values  at  the  extremes. 

This  work  focuses  on  the  problem  of  estimating  the  tails  of  -F(x),  Q(u),  and 
/(x)  from  a  random  sample.  The  most  basic  estimators  of  F(x)  and  Q(u)  from 
a  sample  of  size  n  are  the  sample  distribution  function  defined  as 

F~(x)  =  | fraction  of  the  observed  values  less  than  or  equal  to  x  j,  x  G  IR, 

and  the  sample  quantile  function  defined  as 

Q~(  u)  =  |  [nu  +  ljth  largest  observed  value  j,  0  <  u  <  1, 

where  (•]  denotes  the  greatest  integer  operation.  Nonparametric  density  estima¬ 
tors  follow  this  same  vein  as  basic  estimators  of  the  density  function. 

The  sample  distribution  function,  sample  quantile  function,  and  nonpara¬ 
metric  density  estimate  are  typically  used  in  early  stages  of  statistical  analy¬ 
sis  since  they  make  minimal  assumptions  on  the  underlying  probability  model. 
These  estimators  are  important  data  analytic  tools  used  as  other  known  charac¬ 
teristics  of  the  population  are  incorporated  to  formulate  other  estimates. 

The  classical  approach  to  tail  estimation  is  to  assume  the  underlying  prob¬ 
ability  model  belongs  to  some  known  class  P  whose  elements  are  indexed  by  a 
parameter  9  taking  values  in  a  set  9,  i.e.  P  =  {P$,9  G  9}.  The  distribution 
function,  quantile  function,  and  density  function  then  have  parametric  represen¬ 
tations  F(x;0),  Q(u;9),  and  /(x;0).  Tail  estimates  are  given  by  F[x\9),  Q(u\9), 
and  f(x;9 ),  where  9  denotes  an  estimate  of  the  parameter  9  based  on  the  sample 
from  the  population. 
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The  beauty  of  this  classical  parametric  approach  is  tarnished  by  what  Fisher 
(1948)  called  the  problem  of  specification.  Often  it  is  difficult  to  select  a  single 
parametric  family  for  the  population.  Several  candidates  may  appear  reasonable 
judging  from  their  fit  to  the  observed  values. 

To  demonstrate  this  complication,  suppose  that  a  random  sample  is  taken 
from  a  population  characterized  by  a  symmetric  unimodal  probability  model. 
Two  possible  parametric  families  are  the  normal  and  the  Cauchy.  Figure  1  con¬ 
tains  graphs  of  estimated  F( x),  Q(u),  and  f(x)  when  a  sample  of  n  =  20  from  a 
symmetric  unimodal  probability  model  are  treated  as  a  sample  from  a  normal  dis¬ 
tribution  and  a  Cauchy  distribution.  Both  estimates  are  overlaid  on  the  sample 
distribution  function,  sample  quantile  function,  and  a  kernel  density  estimator. 


FIG.  1.  Estimated  Ffx),  Q(u),  and  f(x)  when  a  sample  of  n  =  20  from  a  sym¬ 
metric  unimodal  probcibility  model  is  treated  as  a  sample  from  a  normal  distribu¬ 
tion  (solid  line  with  blocks)  and  a  Cauchy  distribution  (dotted  line).  Estimates 
are  overlaid  on  the  sample  distribution  function,  sample  quantile  function,  and 
a  kernel  estimate  of  the  density  function  (solid  line).  The  normal  and  Cauchy 
modeling  lead  to  very  different  tail  inference  despite  yielding  similar  inference  for 
central  values. 


Notice  that  the  two  parametric  estimators  yield  similar  inference  for  central 
values  of  the  random  variable.  However,  the  focus  of  this  work  is  on  tail  values, 
not  central  values,  and  inference  at  the  tails  is  quite  different  under  the  two 
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parametric  models.  Extremely  small  and  extremely  large  values  are  much  more 
likely  under  the  Cauchy  modeling.  The  distribution  function  F(x)  approaches 
zero  and  one  much  more  rapidly  under  the  normality  assumption.  The  quantile 
function  <?(u)  for  the  Cauchy  model  decreases  more  rapidly  in  a  neighborhood  of 
zero  and  increases  more  rapidly  in  a  neighborhood  of  one.  The  density  function 
f(x)  for  the  Cauchy  model  has  much  more  area  in  the  tail. 

It  is  very  difficult  to  discriminate  between  the  different  possible  parame- 
terizations  even  when  the  possible  parametric  models  specify  very  different  tail 
properties.  In  this  example,  the  sample  size  is  too  small  for  a  goodness  of  fit  test 
to  have  sufficient  power  to  detect  differences  in  the  observed  tail  and  the  fitted 
tail  under  the  normal  and  Cauchy  modeling.  The  tails  of  the  sample  distri¬ 
bution  function,  sample  quantile  function,  and  nonparametric  density  estimates 
have  insufficient  observations  in  the  tail  to  indicate  important  properties  of  tail 
behavior. 

This  work  proposes  estimators  of  F(x),  Q(u),  and  /(z)  which  are  applicable 
under  minimal  assumptions.  These  estimators  cam  be  used  in  applications  where 
little  is  known  about  the  underlying  population.  The  estimators  can  also  be 
used  in  a  data  analytic  sense  to  validate  tail  behavior  properties  in  probability 
modeling  applications.  The  work  is  outlined  as  follows. 

Section  2  proposes  the  model  for  tail  behavior,  defines  tail  behavior  param¬ 
eters,  and  summarizes  the  characteristics  of  these  parameters.  The  model  for 
tail  behavior  is  a  basic  result  from  which  two  approaches  to  tail  estimation  can 
be  unified.  Generally  applicable  tail  estimates  are  proposed  using  only  those 
observations  which  exceed  a  threshold  value,  i.t.  the  observations  in  the  tail. 

The  fundamental  result  of  this  work  is  stated  in  this  section.  The  quantile 
function  for  the  exceedences  can  be  represented  as  the  sum  of  a  function  which 
can  be  parameterized  and  a  deterministic  error  function  demonstrating  the  de- 
oendence  on  the  threshold  value.  This  representation  motivates  a  parametric 
■ail  estimation  model,  motivates  methods  for  obtaining  parameter  estimates, 
and  simplifies  the  derivation  of  asymptotic  properties  of  the  proposed  parameter 


climates. 
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Three  approaches  to  the  problem  of  parameter  estimation  are  considered. 
The  first  two  treat  the  exceedences  as  a  random  sample  from  a  parametric  fam¬ 
ily  motivated  from  the  representation  for  the  quantile  function  of  the  exceedences. 
Section  3  investigates  the  Generalized  Pareto  distribution  (GPD)  modeling  and 
Section  4  investigates  the  Generalized  Extreme  Value  (GEV)  distribution  mod¬ 
eling. 

An  innovative  approach  to  tail  parameter  estimates  using  the  ideas  of  con¬ 
tinuous  parameter  time  series  on  the  quantile  process  is  introduced.  These  ideas 
are  stimulated  from  the  work  of  Parzen  (1979)  on  location  and  scale  parameter 
estimates. 

The  most  popular  choice  for  parameter  estimates  is  maximum  likelihood. 
A  new  algorithm  is  proposed  for  the  numerical  computation  of  the  GPD  maxi¬ 
mum  likelihood  estimates.  This  algorithm  corrects  the  inadequacies  of  common 
Newton-Raphson  type  algorithms. 

The  second  approach  to  tail  estimates  follows  from  representations  which 
are  derived  from  the  general  tail  behavior  model.  Section  5  proposes  parameter 
estimates  based  on  the  largest  order  statistics  assuming  a  parametric  model  is 
valid  beyond  the  threshold.  The  properties  of  these  estimators  are  treated  in  two 
cases  since  the  parameterization  for  the  tail  depends  on  a  prior  assumption  on 
the  tail  behavior. 

A  comparison  of  the  different  parameter  estimates  is  made  in  Section  6.  All 
the  estimators  are  shown  to  be  biased,  and  no  global  statements  can  be  made 
regarding  an  ‘optimal’  estimator.  A  popular  use  of  the  parameter  estimates  is  as 
diagnostics  for  existence  of  variance  and  higher  order  moments.  However,  great 
caution  must  be  exercised  in  interpreting  parameter  estimates  for  reasons  given 
in  this  section. 

Section  7  discusses  the  important  question  of  threshold  selection.  In  order 
to  reduce  the  bias,  the  threshold  must  be  chosen  as  large  as  possible.  However, 
this  reduces  the  number  of  observations  used  in  the  estimators  and  inflates  the 
variance  of  the  estimates.  A  threshold  selection  procedure  is  proposed  which  min¬ 
imizes  the  distance  between  the  estimated  distribution  function  and  the  sample 
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distribution  function  over  the  tail  values. 

A  motivating  example  is  provided  in  Section  8.  The  data  for  this  example 
considers  the  problem  of  estimating  the  tail  of  the  quantile  function  for  two  rivers 
from  a  history  of  observed  annual  floods.  The  high  dependency  on  the  choice  of 
parametric  family  is  demonstrated.  The  tail  estimators  proposed  in  this  work 
are  applied  as  alternative  estimators  which  make  minimal  assumptions  on  the 
underlying  probability  model. 

Concluding  remarks  are  made  in  Section  9. 
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2.  EXCEEDENCE  OVER  THRESHOLD  APPROACH 
TO  TAIL  ESTIMATES 

2.1.  A  Unifying  Model  For  Tail  Behavior 

2.1.1.  Notation.  Let  Xi,...,Xn  be  a  random  sample  from  a  population 
with  strictly  increasing,  absolutely  continuous  distribution  function  F(x ),  density 
function  f(x)  =  F'(x),  quantile  function 

Q(u)  =  F~l(u)  =  inf{x  :  F(x)  >  u},  0  <  u  <  1, 

density-quantile  function  fQ{u)  =  f  oQ(u),  and  quantile  density  function  q(u)  = 
g'(u).  Notice  that  fQ(u)  -q(u)  =  1. 

From  the  random  sample,  define  the  sample  distribution  function 

F~(x)  =  -,  X(i;  n)  <  x  <  X(i  -I- 1;  n),  $  =  0,  l,...,n, 

n 

and  the  sample  quantile  function 

g'(u)  =  X(t;n),  - — -<«<-,  t  =  1, . . .  ,n, 

n  n 

where  X(t;n)  denotes  the  tth  order  statistic  in  the  random  sample  of  size  n, 
X(0;  n)  =  — oo,  and  X(n  +  1;  n)  =  oo. 

2.1.2.  Tail  Behavior  Model.  Parzen  (1979)  has  suggested  that  the  behavior 
of  the  density-quantile  function  fQ( u)  in  the  neighborhood  of  u  =  0  and  u  =  1 
can  be  used  to  classify  the  tail  behavior  of  a  probability  model.  The  classification 
of  any  continuous  probability  model  follows  from  expressing 

(2.1.1)  /<?(  1  -  «)  =  u"+lL(u), 

where  p  is  called  the  right  tail  exponent  of  the  probability  model  and  L{u)  is  a 
slowly  varying  function  as  u  — *  0+,  i.e.  L{u)  is  a  positive  measurable  function 
defined  on  [0,  oo)  satisfying 

lim  —  1  for  all  A  >  0. 

u— 0+  L(u) 


8 


Table  1  contains  examples  of  common  parametric  probability  models  from  Ap¬ 
pendix  A  expressed  according  to  (2.1.1). 

An  associated  left  tail  exponent  can  be  defined  also.  However,  this  work 
considers  only  the  right  tail  without  loss  of  generality  since  applications  to  the 
left  tail  can  be  made  by  negating  the  random  variable. 

The  tail  exponent  p  is  finite  if  and  only  if  X\  has  a  finite  moment  of  order  6 
for  some  6  >  0.  In  this  research,  only  continuous  probability  models  where  (2.1.1) 
holds  with  finite  p  are  considered.  However,  this  is  not  an  all  inclusive  family.  For 
example,  a  random  variable  with  distribution  function  F(x)  =  1  —  (In x)-1,  x  >  e 
has  p  =  oo  (and  hence  no  finite  moments). 

Estimating  the  tail  exponent  has  become  popular  because  p  <  1/5  if  and 
only  if  E|-Xi|5  <  oo.  In  particular,  testing  H0  :  p  <  j  is  used  as  a  diagnostic  for 
finite  variance.  Other  work  on  tail  exponent  estimation  expresses  the  distribution 
function  as  F(x)  =  1  —  x~aL*(x),  where  a  >  0  and  L*(x )  is  a  slowly  varying 
function  as  x  — >  oo.  It  is  shown  in  Section  5  that  these  two  parameterizations 
for  the  tail  exponent  satisfy  pa  =  1  for  a,  p  >  0. 

The  use  of  slowly  varying  functions  in  defining  the  tail  exponent  is  just  one 
application  of  the  concept  introduced  in  1930  by  J.  Karamata  as  a  suitable  class  of 
functions  in  connection  with  a  Tauberian  theorem  for  Laplace  transforms.  Bing¬ 
ham,  Goldie,  and  Teugels  (1987)  review  the  generalization  to  regularly  varying 
functions  and  provide  examples  of  applications  to  probability  theory  in  the  ar¬ 
eas  of  stability  and  domains  of  attraction,  central  limit  theory,  renewal  theory, 
queues,  occupation  times,  and  extreme  value  theory. 

Examples  of  slowly  varying  functions  as  u  — ►  0+  include: 

(i)  any  positive  measurable  functions  with  positive  limits  at  zero;  for  exam¬ 
ple,  L(u)  =  A[l+0(r(u)))  where  A  >  0  and  r(-)  is  a  positive  measurable 
function  with  limt_^)+  r( u)  =  0; 

(*»')  L(u)  =  —  Inu; 

(m)  L( u)  =  In  In  •  ■  •  (  -  In  u); 

(*'v)  L(u)  =  exp|[-lnuj_0l[ln(-lnu))a3  •••[lnln--*ln(-lnu)JQfc|,  where  k 
is  a  positive  integer  and  0  <  a^  <  1  for  i  =  1, . . . ,  k; 
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Table  i 

Common  paramttric  probability  models  expressed  as  tail  behavior  models  where 
fQ(  1  —  u)  =  u/>+1Ir(ti).  The  parameter  p  is  the  tail  exponent  and  L(u)  is  a 
slowly  varying  function  as  u  — ►  0+.  In  some  cases,  an  asymptotically  equivalent 
expression  for  L(u)  is  given  which  is  in  the  form  of  the  slowly  varying  function 
examples. 


Distribution 

Density-Quantile  Function 

Uniform 

fQ(  1  -  u)  =  u“1+1  •  1 

Neg.  Exponential 

fQ(  1  —  u)  =  u~1+1  •  (21n3)(l  —  u) 

Neg.  Weibull(p) 

fQ(  1  -  u)  =  u(-1/p)+1  ■  op(  1  -  u)[-u-1  ln(l  — 

~  u(_1/p)  +  1  .  orp[l  -  .5(3  _  p  1 ) ti ] 

as  u  — * 

u)](“1/p)+1 

0+ 

Exponential 

/Q(l-u)  =  u0+1  •  (2  In  3) 

Logistic 

fQ[l  —  u)  =  u0+1  •  (4  In  3)  (1  —  u) 

Normal 

fQ{  1  —  u)  =  u0+1  •  1  —  u)]/u 

—  u0+1  •  o(-2lnu)1/2  as  u  -»0+ 

Weibull(p) 

/Q(l  —  u)  =  u0+1  •  op(—  lnu)(-1/p)+1 

Lognormal 

fQ(  1  —  u)  =  u0+1  •  —  u)]/u  •  e~*  1(1' 

~  u0+1  •a(-2lnu)1/2  •  e-*-1*1-”) 

-«) 

as  u  — » 

0+ 

Cauchy 

fQ(  1  -  u)  =  u1+1  •  (4/x)[sin2  7r(l  -  u)]/u2 

~  u1+1  •  4x[l  —  (x2/ 3)u2]  as  u  — *  0+ 

Pareto  (p) 

fQ(  1  —  u)  =  •  op 

Fr^chet(p) 

fQ(  1  -  u)  =  u(1/p)+1  •  op{  1  -  u)[-u-1  ln(l  -  u 
~  u(1/p)+1  •  op\  1  -  .5(3  +  p-1)u] 

as  u  — * 

)](1/P)+1 

0+ 

Note:  See  Appendix  A  for  the  definition  of  o,  a  different  scale  constant  for  each 
different  distribution. 
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(v)  L(u)  =  exp  j[— In  u]/[ln(— In  u)]j 

Further  examples  of  slowly  varying  functions  can  be  generated  from  those  given 
by  noting  two  properties  of  slowly  varying  functions: 

(a)  if  L(u)  varies  slowly  as  tt  — »  0+,  so  does  [I>(u)]Q  for  all  a  €  IR; 

(b)  if  I»i(u)  and  Li(u)  vary  slowly  as  u  — *■  0+,  so  do  Li(u)  •  I>2(u)  and 
Li(u)  +  L2(u).  Further,  if  L2(«)  — »  0  as  u  — ♦  0+,  then  L\  o  L2(u)  is  also 
slowly  varying  as  u  — ♦  0+ . 

Some  of  the  tail  behavior  representations  given  in  Table  1  contain  asymptotically 
equivalent  expressions  for  L(u)  as  examples  of  the  formulations  for  slowly  varying 
functions  and  the  properties  given  above. 

2.2.  Conditional  Distribution  Of  The  Exceedence  Over  A  Threshold 

This  research  proposes  estimates  of  the  tails  of  the  distribution  function, 
density  function,  and  quantile  function  for  the  family  of  random  variables  with 
finite  p.  These  estimators  use  only  the  exceedences  over  a  high  threshold  value. 
This  approach  allows  the  observed  values  in  the  tail  to  dictate  the  tail  estimate. 

The  exceedence  over  a  threshold  is  denoted  by  X-T  given  X  >  T  for  a  given 
threshold  T  satisfying  Q(0)  <  T  <  Q(l).  It  is  easily  shown  that  the  exceedences 
have  distribution  function 


F X-T  I  X>T  fa?)  = 


F{T  +  x)  -  F(T) 
1  -  F(T) 


x  >  0. 


This  distribution  function  expression  is  used  by  other  authors  to  derive  properties 
of  tail  estimates  and  tail  exponent  estimates. 

However,  this  work  suggests  a  representation  for  the  quantile  function  of  the 
exceedences  which 

(1)  motivates  a  unified  parameterization  for  the  tail  of  F(x),  /(x),  and  Q(u); 

(2)  motivates  parameter  estimates;  and 

(3)  simplifies  the  derivation  of  the  asymptotic  properties  of  the  parameter  esti¬ 
mates  and  the  tail  estimates  for  F(x),  /(x),  and  Q(u). 

Before  stating  this  representation  in  the  following  theorem,  define  the  hazard 
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quantile  function  as 

and  the  power  transformation  (also  called  the  Box-Cox  (1964)  transformation) 
for  z  >  0  and  A  €  1R  as 


<7(2;  A)  =  -j  * 


zA  -  1 


, 


In  2,  A  =  0. 


THEOREM  2.2.1.  Suppose  that  fQ(  1  —  u)  =  up+1£,(u),  where  p  €  1R  and 
L(u)  is  slowly  varying  as  u  — ►  0+.  Then 

(2.2.1)  Q  x-T  [  X>T  K'F)  =  hQ/^Z  t»)  ~  “«>?)] 


where  t*  =  1  -  F(T)  and 


The  proof  of  this  theorem  and  those  that  follow  in  this  subsection  are  given 
in  Appendix  B. 

The  representation  given  by  (2.2.1)  expresses  the  quantile  function  for  the 
exceedences  as  the  sum  of  two  functions.  The  first  does  not  depend  on  the  thresh¬ 
old  and  motivates  a  parametric  model  for  the  quantile  function  of  the  exceedences 
based  on  the  tail  exponent  p.  The  deterministic  error  function  e(t,u,p)  expresses 
the  systematic  bias  of  this  parameterization. 

The  convergence  of  e(t,u,p)  to  zero  as  T  -*  Q(l)~  is  an  important  property. 
The  following  theorem  states  the  uniform  convergence  and  a  rate  of  convergence 
result  for  the  deterministic  error  function. 

THEOREM  2.2.2.  Suppose  that  fQ(  1  -  u)  =  up+1L(u),  where  p  6  R  and 
L(u)  is  slowly  varying  as  u  — *  0+. 
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(a)  Then,  for  every  0  <  6  <  1,  limr_<j(i)-  e(t*,u,p)  =  0  uniformly  in  6  <  u  < 
1,  where  t*  —  1  —  F{t). 

(b)  Further  suppose  that 


m 


L(tu) 


-  1 


<  A(tt)iE(<) 


for  some  positive  measurable  functions  A(u)  and  R(t)  where  limf_,  0+  J?(t)  = 
0.  Then  there  exists  a  positive  measurable  function  A*(u)  such  that 
l«(M,p)|  <  A'M^t). 


Previous  work  with  the  exceedences  assumes  T  — *•  Q(l)~  which  permits  the 
simplification  with  e(t*,  1  -u,p)  =  0.  The  effect  of  the  rate  at  which  the  threshold 
converges  is  revealed  in  the  generalization  in  this  work  to  rates  of  convergence  for 
c(f,  u,p).  For  example,  the  most  popular  expression  for  L(u)  in  the  exceedence 
literature  is  Z(u)  =  A[l  +  0(u7)j,  for  A  >  0,  7  >  0.  In  this  case,  T  -+  Q(l)~ 
such  that  R(t*)  =  (t*)7  — ►  0  where  t*  =  1  —  F{T).  However,  the  thresholds  may 
be  required  to  converge  much  more  rapidly.  If,  for  example,  L(u)  =  -  lnu,  then 
T  -*  <2(1) -  satisfying  /?(**)  =  -l/ln<*  -+  0. 

2.3.  Tail  Estimates  Based  On  Exceedences  Over  A  Threshold 

An  important  use  of  the  representation  (2.2.1)  for  the  quantile  function  of 
the  exceedences  is  the  parametric  model  suggested  for  the  tails  of  the  quantile 
function,  distribution  function,  and  density  function  of  the  underlying  popula¬ 
tion. 

To  begin  motivating  this  parameterization,  first  notice  that  the  hazard  quan¬ 
tile  function  hQ(-)  is  used  in  (2.2.1)  as  a  standardization,  but  any  positive  mea¬ 
surable  function  a(*)  satisfying  Hm(_o+  o(f)*hQ(l-f)  =  1  may  replace  hCJ(l-t). 
This  gives  the  more  general  expression 

(2.3.1)  q  x-t  |  x>t  («; T)  =  a(**)[-s(i  -  “ P )  +  «(**,  1  -  «>p)l 

where  t*  =  1  —  F(T).  The  effect  on  the  rate  of  convergence  result  is  that 
|c(f*,l  -  u,p)|  <  A*(u)fT(t*)  if  <z(t)hQ(l  -  f)  =  1  +  0(i?i(t)),  where  fl*(t)  = 
max{R(t),Ri(t)}. 
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The  quantile  function  of  the  underlying  population  can  be  expressed  by 
unconditioning  the  quantile  function  of  the  exceedences.  The  expression  for  the 
distribution  function  follows  by  inversion,  and  from  that,  the  density  function 
expression  follows  by  differentiation.  Hence,  the  tails  can  be  written  as 

(2.3.2)  Q(u)  =  T  +  a(t‘)  [-, 

for  1  —  t*  <  u  <  1 

(2.3.3)  F(x)  =  l-(V!,-l^_-i_  (x-r);-,)  +tr(t\x,p) 

for  T  <  x  <  Q(l) 

(2.3.4)  /(*)=«*  (,-»)'  (--i^  +cf(t‘,x,fi) 

for  T  <  x  <  Q(  1) 

where 

(1  +  A z)l!x,  A  <  0,  z  <  0 
0-1(z,A)  =  <  A  =0,  z  <  0 

(1  +  Az)1/*,  A  >  0,  -1/A  <  z  <  0 

and 

(1  +  Az)(1/A)-i>  A  <  0,  z  <  0 
(<7_1);(z,A)  =  <<•*,  A  =  0,  z  <  0 

k  (1  +  Az)(x/a)-1,  A  >  0,  -1/A  <  z  <  0 
It  is  easy  to  show  that  if  |e(t,u,p)j  <  A*(u).R*(f)  for  some  positive  mea¬ 
surable  functions  A*(tt)  and  £*(<)  where  limt_^+  R*{t)  =  0,  then  |eg(f,u,  P)\  < 

<  ^Hz)  1  &*(*)>  111(1  |c/(t, x, p)|  <  A*f(x)  ■ t  R*(t)  for 
some  positive  measurable  functions  Ag( u),  Aj^z),  and  A^(z). 

A  parameterization  for  the  tail  of  Q(u),  F(x),  and  f(x)  can  be  motivated 
by  assuming  the  functions  (q{t,u,p)  =  0,  ep(t,u,p)  =  0,  cy(t,u,p)  =  0  and 
treating  p  and  a  =  a(t*),  a  scalar  given  the  threshold,  as  parameters.  Sections 
3-5  propose  different  parameter  estimates  for  the  tail  exponent  p  and  the  scaling 
parameter  a  which  have  not  previously  been  considered  under  a  unified  theory. 
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The  paradigm  for  estimating  the  tails  of  Q(u),  F(x),  and  /(x)  from  a  random 
sample  is  as  follows: 

1.  Prom  a  random  sample  X\, . . . ,  Xn,  choose,  as  a  function  of  n,  a  thresh¬ 
old  percentile  tn  close  to  zero. 

2.  Estimate  the  corresponding  threshold  Tn  =  Q~(  1  -  fn). 

3.  Obtain  parameter  estimates  ( p,a )  from  the  exceedences  Xt-  —  Q“(l  —  tn) 
for  all  X,  >  Q'(  1  -  tn). 

4.  Estimate  the  tails  of  the  quantile  function,  distribution  function,  and 
density  function  by 

’-^)]  forl~*n<u<  1. 

(2.3.6)  r(x)  =  1  -tn-  \g~l  (-^|x  -  QT{  1  -  tn)];  -fij 

for  <?~(1  -  tn)  <  x  <  Q(l), 

(2.3.7)  r(x)  =  t„  i .  or1/  -  Q'(i  -  *»)];  -p) 

for  <?~(l-tn)  <x<Q(l). 

The  estimates  for  p  and  a  proposed  in  Sections  3-5  will  be  shown  to  have 
asymptotically  normal  distributions  given  Tn  as  nt*  — *  oo,  where  t\  =  1  —  F(Tn). 
Therefore,  the  asymptotic  normality  of  the  tail  estimates  follows  since  they  are 
functions  of  asymptotically  normal  random  variables. 


(2.3.5)  =  Q-{  1  -  tn)  +  a 


THEOREM  2.3.1.  Suppose  that  conditional  on  Tn,  with  tn  =  1  —  F(Tn), 


A 

Pn 

An. 


PO 

Vll 

u12 

is  AN 

.ao. 

.  Mr* 

.  V21 

u22 . 

as  nt\  — »  oo  for  some  pq  ^  0,  oq  >  0,  and  scalars  suc/i  f/uzf  the  covariance 
matrix  is  positive  definite. 

(a)  For  1  -  t*  <  u  <  1, 


<J‘(u)  is  AN^Tn  +  o<,  •  (-s((l  —  u)/t*;  — p0)],  (»0  1  ’ 


15 


as  nt„  — ►  oo,  where 


(2.3.8) 


,$(«)  ^  -  (^)  P”[^+i”(17r 

+  (012  +  021)  ooff1^)  W-l| 

-[(Hr)'"-]'} 


(b)  For  Tn<x<  Q(l), 


F\x)  is  Ax(l-fn-g-l{-{l/ao){x-Tny,-Po),  (<;/»)  •*£(*)) , 


as  nt„  -+  oo,  where 


(2.3.9) 


4(*)=(x  rr  [1  +  S^~rn)]  2/P° 

Poao  1-  00 

•{t,11ao[ln(l  +  ~(*-r„)) 

+  {v\2  +  v2l)  PqQQ  1  +  “(*  ~  ^r»)j  ^  +  ”(z  ~  Tn'l'j 

+  v22  Po  1  +  ^(z  ”  ^n)j  | 


(e)  For  Tn<x<  Q(  1), 


r(x)  is  an(«;(i/oo)  •(y-ly(-(i/°o)(*“r»);-^o)*  (*»/»)•*/(*)). 


as  Pi„  — »  oo,  w/iere 
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(2.3.10) 

.2 


Poa5 


-2Kl/Po)  +  l] 


{■ll(i  -  Tn)2  [in  (l  +  S>(*  -  r„)) 

+  (•’12  +  ”21)  pUx  ~  r")2  ln  f1  +  ~  rn)) 


+  v22 


{■ 

4-«(ar) 


l-h-(x-Tn) 

“0 


-1 


-  1 


2.4.  Expected  Values  For  Functions  Of  The  Exceedences 

In  determining  the  properties  of  estimators  proposed  in  Sections  3-5,  the 
expectation  of  certain  functions  of  the  exceedences  are  needed.  In  the  quantile 
domain,  moments  are  easily  found  noting  the  relation 

E  X-T  |  X>T  [Hx  ~  T)\  =  f  Q  X-T  |  X>T  iu)\du 

for  any  function  0(-)  where  E  x-T  \  X>T  \^(X  —  T)|  <  00. 

The  proof  of  the  following  theorem  uses  this  quantile  expression  for  expec¬ 
tation,  the  representation  for  the  quantile  function  of  the  exceedences,  and  the 
rate  of  convergence  result  for  the  deterministic  error  function. 


THEOREM  2.4.1.  Suppose  that  fQ(  1  — u)  =  up+lL(u),  where  p  6  It,  p  ^  0 


and 


m 


L{tu) 


-  1 


for  some  positive  measurable  functions  A(u)  and  R(t)  where  lim(_o+  m  =  0. 
For  a  given  threshold  value  Q(0)  <  T  <  Q(l),  let  t*  =  1  -  F(T).  Further,  let 
a  =  a(t*)  be  the  scalar  value  of  a  function  a(-)  satisfying  a(f*)hQ(l  -  t*)  = 
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l  +  0(J2i(f*))  for  some  positive  measurable  function  with  lim(_Q+  #1 (0  = 

0.  Also,  suppose  —p[Q{  1)  —  T]hQ(l  —  t*)  =  1  +  0(R2(t*))  for  some  pos¬ 
itive  measurable  function  i?2(0  tinth  limt_o+  i?2(0  ==  0-  Finally,  suppose 
p  T  hQ(  1  —  t*)  =  1  +  0(R$(t*))  for  some  positive  measurable  function  R$(t) 
with  limt_0+  R${t)  =  0.  Then 

E  x-T  |  :>T  [l  +  ?(Jf-r)]a[ln{n-?(X-r)}]# 

=  /  (1  -  [ln(l  -  «)-'>]'* du  +  0(*;(f)), 

JO 

E  x-T  I  X>T  1  - 

E  X-T  |  X>T  =Z^TT  +  0(^(^))> 

E  X-r  |  Jf>T  In  ^1  +  =  J^[ln{l-u)~p}adu  +  0{RZ(t*)), 

where  =  max{  J2(t),  R\[t)},  R%{t)  =  max{f2(t),  J?2(0}>  and  #3(0  = 

max{rt(t),#3(t)}. 


The  proof  of  this  theorem  is  given  in  Appendix  C. 


3.  PARAMETER  ESTIMATES  FROM 
THE  GENERALIZED  PARETO  DISTRIBUTION 


3.1  Method 


One  approach  to  estimating  p  and  a  is  to  treat  the  exceedences  as  a  random 
sample  from  a  parametric  model  suggested  from  the  conditional  distribution  of 
the  exceedences.  The  first  of  two  possible  parametric  models  can  be  motivated 
as  follows.  From  (2.3.1), 


Q  x-t  |  x>t  («; T)  =  a(O(-0(i  -  «;  -p)  +  «(**. 1  -  u»p)1- 

Taking  a  =  a(f*)  as  a  scalar  given  T  and  e(t,u,p)  =  0  for  all  t ,  u,  p  suggests 
the  Generalized  Pareto  Distribution  (GPD)  modeling  defined  below.  The  GPD 
model  for  tail  estimates  was  first  proposed  by  Pickands  (1975). 

A  random  variable  W  ~  GPD(p,a)  with  p  €  R,  a  >  0  if  it  has  quantile 
function 

Qg PD(tt;  P.  a)  =  -a  ■  g{  1  -  u;  -p). 


Notice  that  the  GPD  can  also  be  naturally  referred  to  as  the  Power  Uniform 
Distribution  since  it  can  be  derived  by  taking  the  power  transformation  of  a 
Uniform(0,l)  random  variable. 

By  inverting  the  quantile  function,  the  distribution  function  is 

f  1  -  (l  +  ^ ,  P  <  0,  0  <  w  <  -a/p 


*gpd(w;p,o)  =  ^  i  -  e~w!a, 

and  it  follows  that  the  density  function  is 

1  /  piu\(-l/p)-l 

o  V  a  / 

/GPD(^;p,a)  =  <  -e~w/a, 


p  —  0,  w  >  0 
p  >  0,  w  >  0, 

p  <  0,  0  <  w  <  —a/p 
p  =  0,  w  >  0 
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To  obtain  parameter  estimates,  choose,  as  a  function  of  n,  a  threshold  per¬ 
centile  t„.  Then  let  the  threshold  value  be  given  by  Tn  =  Q'(l  -  tn).  Compute 
the  exceedences  Xt  —  Q~(  1  —  fn)  =  Xx  -  Tn  for  all  Xx  >  Q'(  1  -  tn)  =  Tn  and 
treat  them  as  a  random  sample  from  a  GPD(p,a). 

3.2  Parameters  Estimated  By  Maximum  Likelihood 


In  other  papers  using  the  GPD  to  estimate  tail  behavior,  maximum  like¬ 
lihood  estimation  is  most  popular.  For  example,  DuMouchel  (1983),  Davison 
(1984),  R.  L.  Smith  (1984,  1987),  J.  A.  Smith  (1986),  and  Joe  (1987)  propose 
maximum  likelihood  to  obtain  the  GPD  parameter  estimates. 

Assuming  the  exceedences  are  a  random  sample  from  a  GPD,  the  maximum 
likelihood  estimates  of  p  and  a  are  the  values  which  maximize  the  log-likelihood 

(  —  [ntn]  In  a 


^GPD  (p«  al^) 


P  <  0, 

a  >  -p-  lr([ntn];[nt„]) 


y  [nt*»l 

-[ntn]  Ina - V,,  p  =  0,  a  >  0 

t=l 


~[ntn]  In  a 


v 


p  >  0,  a  >  0, 


where  [•]  denotes  the  greatest  integer  operation  and  Y+  =  Xx  -  Q~(  1  —  tn)  for  all 


Xx  >  Q'(l  -  tn),  with  y([nfnl;(ntnl)  =  max{7i, . . . ,  Fjntn|}- 


3.2.1  Asymptotic  Properties.  The  asymptotic  properties  of  these  estimators 
do  not  follow  directly  from  large  sample  maximum  likelihood  theory  since  the 
exceedences  are  not  a  sample  from  a  GPD  in  general.  To  derive  these  results, 
first  express  the  estimators  as  solutions  to  a  set  of  estimating  equations,  take  the 
Taylor’s  series  expansion,  and  then  compute  the  asymptotic  distribution  of  each 
term.  This  approach  yields  the  following  result. 


THEOREM  3.2.1.  Suppose  that  fQ(  1  —  u)  =  where  —  j  <  p  < 

00,  p  /  0  and 

L(t) 


L{tu) 


-  1 


<  j4(u)i?(f) 


{or  some  positive  measurable  functions  A(u)  and  R(t)  where  limt  _0+  R{t)  =  0. 
Let  {Tn}  be  a  sequence  of  threshold  values  defined  on  (<2(0),Q(1))  such  that 
nt*  — ►  00  and  t„  — ►  0  as  n  — *  oo,  where  i*  =  1  —  F(Tn).  Further,  let  a  = 
a(t*)  be  the  scalar  value  of  a  function  a(-)  satisfying  a(t*)/iQ(l  —  *;)  =  i  + 
0(Ri(t„))  for  some  positive  measurable  function  .fti(f)  with  limt_,  0+  #i(0  =  0. 
Let  (Pm  On)  denote  the  maximum  likelihood  estimates  from  the  GPD  model  for 
the  exceedences.  Then  conditional  on  Tn, 

'p  +  0(R*(t'n)) 

oo,  where  R*(t)  =  max{iE(t),.Ri(f)}  and 

(/>  +  l)2  +  <=>(«*((*))  -«(,  +  1)  +  0(R'(0) 

-a(p  +  1)  +  0(fl*  (t-))  2a2(p  +  1)  +  0(R‘  (t*))  J 


<w  nt*n 


Pn 

«nj 


VGPD  = 


is 


AN 


»  (tntnl)  1^GPD 


PROOF.  Let  ( pn,an )  denote  the  maximum  likelihood  estimates  derived  as¬ 
suming  the  exceedences  over  the  threshold  Tn  are  a  random  sample  from  a  GPD. 
Then,  (pnidn)  is  the  solution  to 

Kl 

o = (Kir1'2  E 

*'=i 

Take  the  first  order  Taylor’s  series  expansion  about  the  true  parameter  values 
(p, a)  of  the  right  hand  side  to  obtain,  for  some  point  ( p*,a *)  on  the  line  segment 
between  (pn,an)  and  (p, a), 

Kl 

»=1 


0  =  (Kir1/2  £ 


dlGPD (P>a'<  X\  ~  Tn) 
dp 

dCGPp{p,*iXj  -  Tn) 

da 


d^GPD [Pn>an\Xi  -  Tn) 
dp 

d£G?Ti{Pn,an\Xi  -  Tn) 

da 
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+([<]) 


*n-l 


d2CGPD{p',a*]Xi-Tn)  d2£GpD{p\a*;Xi-Tn) 
dp 2  dpda 

*= 1  a2rGpp(pV*;X,-rn) 

dpda  da* 


■<Ki)l/2 


Pn  “  P 


an  -  a 


From  the  Central  Limit  Theorem  for  an  iid  sequence, 


a£GPD(p,  o;  X,  -  r„) 

I<] - J0 - 

(Kl)-,/2  E  , 

i=l  a£Gpp(p,a;Xi  -  Tn) 

da 

as  nt*  — ►  oo,  where 


is  AN(Mn,([n<nD  l£n) 


/in  =  E  X-Tn  |  X>Tn 


a^Gpp (p,a;Xi  -  rn) 

dp 

dC’G?v{P'>ayXi  -  Tn) 
da 


o(R*K)) 

o(R*K)) 


En  =  Cov  X-Tn  |  X>Tn 


^^Gpp(p.°;xi  ~  rn) 

dp 

a£cpp(p»  X\  -  Tn) 

da 


^^GPP(P»  ai  Xj  ~  ?n) 
dp 

aiGPp(p>g;*i  -  jn) 
da 


(p  +  1)(2  P  +  1) 


+  0(tf(O) 


a(P  +  l)(2p  +  1) 


+  <WC)) 


a(p  +  l)(2p  +  1) 


+  <WO) 


o2(2p+l) 


+  <WO) 


This  result  follows  from  the  expression  of  the  gradient  vector  for  the  GPD  log- 
likelihood  which  is  given  in  Appendix  D  and  the  moments  given  in  Theorem 
2.4.1. 
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For  n  sufficiently  large  to  make  the  O(H0(t*))  terms  negligible,  notice  that 
£n  is  positive  definite  if  and  only  if  p  >  —  j.  The  case  p  <  —  j  is  treated  in  detail 
by  Smith  (1987),  but  the  important  change  is  that  the  asymptotic  distribution 
of  the  GPD  maximum  likelihood  estimates  is  no  longer  normal,  nor  is  the  rate 


ii  £„  is  positive  definite,  then  from  the  Weak  Law  of  Large  Numbers  and 
the  fact  that  (p*,a*)  (p, a)  as  nt„  — * ►  oo,  it  follows  that 


(»*;) 

(Kir1  E 


•=i 


dp 2 
dpda 


a2£GPD(r,a‘;^-r 

dpda 

^lorotp*.  -  T„) 
da 2 


0 

1 


as  nt*  — *  oo.  This  result  follows  from  the  expression  of  the  Hessian  matrix  for 
the  GPD  log-likelihood  which  is  given  in  Appendix  D  and  the  moments  given  in 
Theorem  2.4.1. 

The  asymptotic  distribution  of  (pn,  an)  then  follows  from  Slutsky’s  Theorem, 
observing  that  Vqpd  =  E^1.  □ 


3.2.2  Computational  Aspects  of  GPD  Maximum  Likelihood  Estimation.  This 
subsection  contains  a  detailed  investigation  into  the  problem  of  maximizing  the 
GPD  log-likelihood  over  the  parameter  space. 

Suppose  that  Yj, . . . ,  is  a  random  sample  from  the  GPD  with  largest  value 
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Y(k\  k).  The  log-likelihood  is  given  by 


^g?d(p>^y)  =  S 


JL 

-kina  -  0  +  1^  ^ln  0  +  — ^  ,  p  <  0, 

a  >  —p  Y  (k;  k) 

,  k 

-kina  -  ~^Yi, 


p  =  0,  a  >  0 


*=i 


-k In  a  -  0  +  ^  In  0  +  — ^  ,  p  >  0,  a  >  0. 


If  p  <  —1,  there  is  no  maximum  likelihood  estimate  since  for  any  p  <  —  1, 
lim0_>_p.y(j..jfc)+  CGPT>[Pia‘i Y)  —  ^  order  to  obtain  a  finite  maximum  of  the 

GPD  log-likelihood,  the  constraint  p>  —  1  must  be  imposed. 

There  are  (in  most  instances)  three  values  of  (p,  a)  which  are  candidates  for 
the  GPD  maximum  likelihood  estimator.  The  first  of  these  involves  the  boundary 
value  p  =  -1  due  to  the  above  constraint  on  the  domain  of  £gPd(  )-  Given 
p  =  —1,  the  GPD  log-likelihood  is  maximized  at  d  =  Y(k\k).  This  follows  since 
£qpd(p  =  -l,a;y)  =  -kina  is  maximized  as  a  -p  •  Y(k]k)+  =  Y(k;k)  +  . 
The  problem  is  complicated  by  the  optimization  being  taken  over  an  open  set, 
but  it  is  treated  as  a  maximum  taken  over  a  closed  set.  Any  relative  maxima 
found  over  the  domain  of  £gpd(')  nmst  exceed  the  GPD  log-likelihood  evaluated 
at  this  boundary  in  order  to  be  the  maximum  likelihood  estimator. 

Figure  2  shows  a  graph  of  the  GPD  log-likelihood  function  (with  a  slight 
modification  to  permit  its  definition  on  the  grid  required  for  the  graphing  rou¬ 
tine)  for  a  generated  random  sample  from  the  GPD.  Clearly,  there  exist  relative 
maxima  and  minima  on  the  domain  of  £gpd(p>°;  y)  whose  values  are  found  by 
applying  the  principles  of  calculus. 

Consider  the  space  defined  by  A  =  {-1  <  p  <  0,  a  >  —p  •  Y(k;k)}  U  (p  > 
0,  a  >  0}.  For  some  (p,a)  €  A ,  the  gradient  vector  is  equal  to  zero.  Using  the 
expression  for  the  gradient  vector  of  the  GPD  log-likelihood  given  in  Appendix 
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FIG.  2.  Graph  of  the  GPD  log-likelihood  function  for  a  generated  random  sam¬ 
ple.  The  function  has  been  slightly  modified  to  permit  its  definition  on  the  grid 
required  for  the  graphing  routine.  Notice  that  there  exist  relative  maxima  and 
minima,  implying  multiple  roots  of  the  gradient  vector  over  the  two-dimensional 
parameter  space. 


D,  the  solution  to  the  simultaneous  equations  may  be  simplified  and  written  as 


d^GPD  [p,a',Y) 

dp 

d-£GPp(p»a;y) 

da 


M4+i)  =  £ln  (1  +  ^) 

(>+f) 

*=w+DE(i+f  r 

»=1  '  ' 
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The  bivariate  search  for  the  zeroes  of  the  gradient  vector  over  A  can  be 
reduced  to  a  univariate  search  since  the  second  equation  is  a  closed  form  rep¬ 
resentation  for  the  estimator  of  p  given  the  ratio  p/a,  and  the  first  equation 
depends  only  on  p/a.  Therefore,  the  zeroes  of  the  gradient  of  the  log-likelihood 
of  the  GPD  are  the  solution(s)  to  h(p/a)  =  0,  where  the  function  h(-)  is  defined 
by 

Mr?)  = 

with  domain  {17  >  -l/Y(k;k),  rj  ^  0}. 

An  example  of  the  function  h{rj)  is  given  in  Figure  3  for  the  random  sample 
from  the  GPD  used  in  Figure  2.  Notice  that  there  are  two  roots  of  h{-)  in  this 
example,  only  one  of  which  corresponds  to  the  local  maximum.  It  is  easily  shown 
that  h(  )  is  continuous  at  zero  since  lim^—o  MM  =  0. 

A  more  important  consequence  of  this  limit  is  that  tj  — ►  0  gives  the  second 
possible  value  for  the  GPD  maximum  likelihood  estimate.  The  limit  77  — ►  0 
corresponds  to  the  case  p  —  0,  where  £gPd(')  only  a  function  of  a.  The 
extremum  p  =  0,  a  =  1/Y,  where  Y  =  it-1  X)»L=i  Yj,  follows  from  solving 

dlGPD (p  =  0,q;y) _ L  sh  v  -  -  -  n 

da  ~  a*  4-  *  a 

»=l 

which  is  a  local  maximum  if 


^2^gpd(p  =  y)  _  A.lVv 

da2  -  a2  &sL,r*<u 

t=i 


a  <  y/2. 


The  third  possible  value  for  the  maximum  likelihood  estimate  which  exists  in 
most  cases  must  be  found  numerically.  Subsection  3.2.3  describes  an  algorithm 
which  determines  if  such  a  root  exists  and,  assuming  it  does  exist,  finds  it  using 
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FIG.  3.  Graph  of  tht  function  h(rj)  used  to  simplify  the  search  for  the  zeroes  of 
the  gradient  vector  on  the  two-dimensional  parameter  space  for  a  GPD  random 
sample.  Notice  that  there  are  two  zeroes  of  the  function,  only  one  of  which 
corresponds  to  the  local  maximum. 


the  bisection  root  finding  algorithm.  If  such  a  root  is  found,  the  third  possible 
value  for  the  GPD  maximum  likelihood  estimate  can  be  computed  from 

1  A 

^s=*£ln(1  +  ^1») 

*=l 

A 

«  P 
a  —  ~  • 

* 1 

This  relative  extremum  must  be  verified  to  be  a  local  maximum  by  consid¬ 
ering  the  Hessian  matrix  of  the  GPD  log-likelihood  given  in  Appendix  D.  The 
point  (p,  a)  is  a  local  maximum  if  the  Hessian  matrix  evaluated  at  the  estimators 
is  negative  definite. 


3.2.3  Proposed  Algorithm  For  The  GPD  Maximum  Likelihood  Estimator. 
Hosking  and  Wallis  (1987)  attempted  to  find  the  GPD  maximum  likelihood  es¬ 
timator  using  Newton-Raphson  optimization  in  two  dimensions  and  found  that 
their  algorithm  failed  to  converge  to  a  local  maximum  with  alarming  frequency. 
The  table  of  failures  to  converge  given  by  Hosking  and  Wallis  (1987)  is  repro¬ 
duced  in  Table  2. 
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Table  2 

Failure  rate  of  the  Newton-Raphson  optimization  in  two  dimensions  of  the  log- 
likelihood  of  the  GPD,  reproduced  from  Hosking  and  Wallis  (198 7).  Tabulated 
values  are  the  number  of  failures  to  converge  of  Newton-Raphson  per  100  simu¬ 
lated  samples.  The  GPD  log-likelihood  has  a  maximum,  but  this  algorithm  fails 
far  too  frequently  to  be  considered  reliable  in  practice. 


p  =  -.4 

P  =  -.2 

P  =  0 

P  =  .  2 

p-  A 

n  =  15 

41.7 

22.7 

12.2 

4.8 

3.6 

n  =  25 

14.6 

4.7 

1.5 

.3 

.2 

The  observed  failure  to  converge  of  the  Newton-Raphson  algorithm  has  two 
explanations.  First,  the  root  of  the  gradient  vector  for  the  GPD  log-likeiihood 
may  not  satisfy  the  second  order  conditions  to  be  a  local  maximum.  The  GPD 
log-likelihood  often  has  two  zeroes  of  the  gradient  vector,  so  the  root  obtained 
by  the  Newton-Raphson  algorithm  using  a  particular  initial  value  may  not  be 
the  local  maximum. 

Second,  in  terms  of  the  parameters  (p,o),  the  root  of  the  gradient  vector 
corresponding  to  rj  -♦  0  is  p  —  0.  However,  it  is  interesting  to  note  that  a 
numerical  optimization  routine  in  two  dimensions  may  increase  a  at  each  iteration 
in  an  attempt  to  find  the  zero  of  the  gradient  vector.  Such  behavior  is  due  to 
the  reparameterization  rj  =  p/a ,  where  r?  can  be  made  arbitrarily  close  to  zero 
by  letting  a  — ►  oo.  Therefore,  the  gradient  vector  may  always  be  made  closer  to 
zero  with  a  larger  value  of  a.  The  Newton-Raphson  algorithm  in  two  dimensions 
can  continually  increase  a  until  reaching  an  upper  bound  for  the  iterations  which 
then  signals  that  the  algorithm  failed  to  converge. 

An  algorithm  which  computes  each  of  the  three  possible  values  for  the  GPD 
maximum  likelihood  estimator  discussed  in  Subsection  3.2.2  and  then  selects  the 
maxima  of  the  GPD  log-likelihood  is  given  by: 

•  Choose  a  6  such  that  for  (17  j  <  6,  it  will  be  considered  that  rj  =  0  and  there 
is  a  lone  solution  to  h(rj)  =  0.  For  example,  let  6  =  .0001. 

•  Choose  an  «  to  be  used  as  a  convergence  criterion  such  that  for  \h(fj)\  <  t, 
it  will  be  considered  that  17  satisfies  ^(17)  =  0.  For  example,  let  e  =  .0001. 
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•  Compute  h{-6)  and  h(6). 

•  If  h(— 6)  <  0,  then: 

1.  Let  =  (—1/Y(k;k)]  •  [t/(t  +  1)]  for  the  smallest  i  €  {1,2,.  ..,M} 

such  that  >  0,  where  M  is  a  specified  bound  on  the  number  of 

iterations.  If  there  is  no  ijW  such  that  h(rj(L))  >  0,  then  a  nonzero  root 
cannot  be  found  numerically. 

2.  Let  r/W)  =  —6. 

3.  Use  the  bisection  algorithm  to  find  r)  such  that  \h(r})\  <  e. 

•  If  h(6)  >  0,  then: 

1.  Let  tjW  =  S. 

2.  Let  =  t  for  the  smallest  i  €  {1,2, such  that  h(rj (^))  <  0, 
where  M  is  a  specified  bound  on  the  number  of  iterations.  If  there  is 
no  such  that  h(rj W)  <  0,  then  a  nonzero  root  cannot  be  found 
numerically. 

3.  Use  the  bisection  algorithm  to  find  >7  such  that  \h{fj)\  <  e. 

•  If  a  nonzero  rj  is  found  such  that  \h(fj)\  <  e,  then: 

1.  Compute  pi  =  (1/Jfc)  Z$ml  ln(l  +  fjYi)  and  ai  =  pi/fi. 

2.  Compute  the  Hessian  at  (pi,ai). 

3.  If  the  Hessian  at  (pi,di)  is  negative  definite,  then  (pi,di)  is  a  local 
maximum  of  the  GPD  log- likelihood. 

•  If  Y  >  l/i/5,  then  P2  =  0  and  02  =  l/Y  is  also  a  local  maximum  of  the 
GPD  log-likelihood. 

•  The  relative  maximum  is  (pi,di)  if  £gPd(£i> *0  >  £gPd(P2»02'.  50- 
Otherwise,  the  relative  maximum  is  (p2>02)- 

•  If  £gpd [Pi*'iY)  >  -k\nY{k‘,k)  where  (p,d)  denote  the  relative  maximum, 
then  p  and  a  are  the  GPD  maximum  likelihood  estimates.  Otherwise,  the 
boundary  maxima  p  =  —  1  and  d  =  Y{k\  k)  are  the  GPD  maximum  likelihood 
estimates. 

It  should  be  noted  that  the  bisection  algorithm  is  preferred  for  the  numerical 
root  finding  because  it  ensures  that  the  nonzero  root  will  be  found  if  it  exists. 
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The  bisection  algorithm  assumes  two  values  17^)  and  are  given  such  that 
h(rj W)  •  h(rj[u))  <  0,  and  a  convergence  criterion  c  (which  was  defined  earlier) 
is  specified.  Given  these  values,  the  bisection  algorithm  is  defined  as: 

•  Compute  fj  =  (rjW  +  tjW)/ 2. 

•  If  |/»(i7) |  <  e  then  terminate  the  algorithm,  returning  17. 

•  If  1/1(17)  |  >  e  then 

1.  If  h(rj)  •  hfaW)  <  0  then  set  17^)  =  17  and  repeat  the  algorithm. 

2.  If  h(rj)  •  h(rj[u))  <  0  then  set  17 ^  =  17  and  repeat  the  algorithm. 

Typically,  the  bisection  algorithm  is  criticized  for  its  slow  rate  of  convergence. 
However,  in  the  application  of  this  algorithm  in  simulation  studies,  the  nonzero 
root  of  h(  )  is  most  often  found  in  five  to  ten  iterations. 

3.3  Parameters  Estimated  From  The  Sample  Quantile  Process 

Other  parameter  estimates  can  be  obtained  by  applying  the  theory  of  regres¬ 
sion  analysis  on  continuous-parameter  time  series  from  the  reproducing  kernel 
Hilbert  space  (RKHS)  point  of  view  given  by  Paraen  (1961a, b,  1967)  applied  to 
the  sample  quantile  process  Q~(u).  This  approach  follows  the  ideas  in  Parzen 
(1979)  for  estimating  location  and  scale  parameters. 

Parzen  (1979)  motivates  this  approach  by  first  stating  the  following  theorem 
on  the  strong  approximation  of  the  quantile  process. 

THEOREM  [Csorgd  and  R4v&z  (1978)].  Let  {Q’(u),  0  <  u  <  1}  denote  the 
sample  quantile  process  of  a  random  sample  from  a  population  with  continuous 
distribution  function  Fq(x),  quantile  function  <?o(u)>  density  function  fo{x),  and 
density- quantile  function  /oQo(l  -  u)  =  up+lL(u)  for  p  €  IR  and  Z,(u)  slowly 
varying  as  u  —>  0+.  Let  (Q^"(u),  0  <  u  <  1}  denote  the  quantile  process  of  the 
uniformly  distributed  random  variables  Ut  =  F(Xx).  Let 

Rn=  sup  y/n  |/o<?o(tt){Q'(tt)  -  Qo(tt)}  -  {Qv'iu)  ~  u)l- 

0<u<l 
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Then  almost  surely, 


(  0(n  1/2  In  Inn), 


Rn  = 


‘  0(n  */2(ln Inn)2), 

0(n-1/2(  In  lnn)^+1(ln  n)p(l+t)), 


for  every  €  >  0. 


if  P  <  0 
if  P  =  0 
if  P  >  0 


Therefore,  /o<?o (“){<?”(“)  ~  Qo(u)}  can  be  approximated  by  the  uniform 
sample  quantile  process  {<?£/”(«),  0  <  u  <  1}  whose  weak  convergence  is  con¬ 
sidered  in  the  following  theorem.  Define  a  Brownian  bridge  to  be  a  zero  mean 
normal  process  with  covariance  kernel  Kfl(s,t)  =  min{a,t}  —  at,  0  <  s,t  <  1. 

THEOREM  [Csorgo  and  R6v£sz  (1975)).  A  Brownian  bridge  (J9n(u),  0  < 
u  <  1}  can  be  defined  for  each  n  such  that,  almost  surely, 

sup  |>/n{Qt/"(tO  -  t*}  -  Bn(«)|  =  0(n-1/2lnn). 

0<u<l 


These  two  theorems  are  then  interpreted  for  purposes  of  statistical  inference 
to  mean  that  y/nfoQo(u) [<?'(«)  -  Go(tt)l  “  distributed  as  a  Brownian  bridge 
fl(u). 

The  GPD  model  for  exceedences  of  a  threshold  assumes  that  for  given  tn, 


with 


Q  X-Tn  I  X>Q'(l-t„)=rn  (u?  Tn)  =  ~a  ■  g(  1  -  u;  - p ) 


fQ  X-Tn  |  X>Q’{l-tn)=Tn 


(u»  tn ,  Tn)  — 


(1  -  u)p+l 

a 


The  sample  quantile  process  for  the  exceedences  for  given  tn  is 


Q'x-Tn  |  x>Q’(i-tn)=Tn  («;*». T»)  =  Q'U  -  *n(i  -  *0)  -  -  U)- 

Therefore,  estimating  p  and  a  becomes  a  problem  in  regression  analysis  of 
continuous- parameter  time  series  by  writing 

(i  -  u)'+1[<?-(i  -  tn(i  -  «))  -  g-(i  -  «»)) 

=  o(l  -  u)p+ly(l  -  u;  -p)  +  oqB{u) 


where  ag  =  aj  \J[ntn\  is  treated  as  a  free  parameter  not  constrained  to  be  related 
to  a. 

Estimators  can  be  formed  from  this  time  series  regression  analysis  after  a 
reproducing  kernel  inner  product  is  found  corresponding  to  the  Brownian  bridge 
covariance  kernel  Kg(s,t).  This  RKHS  consists  of  Li  differentiable  functions 
/,  g  on  the  interval  p  <  u  <  q  with  inner  product 

(f,9)p,q  =  f  /'(uy(u)du  +  ^ f{p)g{p )  +  73-/(?)s(?)- 

Jp  P  A  9 

Parzen  (1979)  proves  the  reproducing  formula  =  /( 0  for  p  <  t  < 

q  which  verifies  {f,g)p,q  is  -he  reproducing  kernel  inner  product. 

Applying  the  ideas  of  modeling  the  quantile  process  as  a  continuous  param¬ 
eter  time  series,  Parzen  (1979)  derives  optimal  estimates  (along  with  the  corre¬ 
sponding  influence  functions)  for  location  and  scale  parameters.  This  approach 
does  not  meet  with  the  same  success  in  the  GPD  modeling  for  tail  estimates 
since  the  quantile  process  is  expressed  as  a  nonlinear  function  of  the  parameters. 
However,  given  p  the  model  is  a  linear  function  in  a. 

Drawing  from  the  applications  paradigm  of  the  Box-Cox  (1964)  transforma¬ 
tion  in  regression,  estimates  of  (p,  o)  are  found  by: 

1.  Choose  a  reasonable  range  of  values  for  p. 

2.  For  each  value  pM ,  compute 

a(*> 

ap,<? 

((1  -  u)p+lg[l  -  u;  -p),  (1  -  «)'+1[<r(l  -  tn(l  -  *))  -  Q*(l  -  tn)])p  q 

((1  -  u)^+1<7(l  -■  u;  — p),  (l  -  tt)P+»j(l  -  u;-p)\ 

'  '  P>9 

and  compute  the  value  of  a  specified  loss  function  i2(p(*),dp*g). 

3.  Choose  the  estimate  of  (p,a)  which  minimizes  R(  ). 
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4.  PARAMETER  ESTIMATES  FROM 

THE  GENERALIZED  EXTREME  VALUE  DISTRIBUTION 

4.1  Method 

A  second  parametric  model  for  the  conditional  distribution  of  the  excee¬ 
dences  can  be  motivated  as  follows.  From  (2.3.1), 

Q  X-T  j  X>T  («;  T)  =  a(t*)[-0(l  -  «;  -p)  +  e(t*,  1  -  u,p)]. 

Taking  a  =  a(f*)  as  a  scalar  given  T ,  e(f,u,p)  =  0  for  all  t,  u,  p,  and  noticing 
that  as  u  — »  1",  g(—  lnu;  -p)  ~  y(l  —  u;  — p)  suggests  the  Generalized  Extreme 
Value  Distribution  (GEV)  defined  below.  The  GEV  probability  model  is  used  ex¬ 
tensively  in  practical  applications  involving  floods  and  extreme  sea  levels  where 
the  random  variable  is  the  maximum  value  over  a  given  time  period.  Jenkin- 
son  (1955),  the  Flood  Studies  Report  [Natural  Environment  Research  Council 
(NERC)  (1975)1,  and  Blackman  and  Graff  (1978)  give  examples  of  GEV  appli¬ 
cations. 

In  this  work,  it  is  important  to  point  out  that  the  GEV  is  used  as  a  model  for 
the  exceedences.  This  alters  both  the  application  of  the  model  and  the  properties 
of  the  estimates  from  the  usual  scenarios  where  the  entire  sample  is  modeled  as 
GEV. 

A  random  variable  W  ~  GEV(p,a)  with  p  €  IR,  a  >  0  if  it  has  quantile 
function 

Qgev(«;P»«)  =  —a  •  g(—  lnu;  — p). 

The  name  Generalized  Extreme  Value  distribution  follows  from  its  unifying 
representation  of  the  three  types  of  extreme  value  distributions  derived  by  Fisher 
and  Tippett  (1928).  However,  the  GEV  can  also  be  naturally  referred  to  as 
the  Power  Exponential  Distribution  since  it  can  be  derived  by  taking  the  power 
transformation  of  an  Exponential  1)  random  variable. 
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By  inverting  the  quantile  function,  the  distribution  function  is 


^GE 


rap{.(1  +  f)-,/'}, 


p  <  0,  0  <  w  <  -a/ p 
p  —  0,  w  >  0 
p  >  0,  w  >  0, 


and  it  follows  that  the  density  function  is 


/gev(^;p.<i) 


il 


p  <  0,  0  <  w  <  -a I p 


-e~w/a 

a 

«exp{-e"t"/°},  p  =  0,  w  >  0 

1  /  ^  +  pu;\  (— 1/p)— 1 
a  \  a  / 


"xp{-(1  +  t)  llf}'  p>°' 


w  >  0. 


To  obtain  tail  estimates,  choose,  as  a  function  of  n,  a  threshold  percentile 
tn.  Then  let  the  threshold  value  be  given  by  Tn  =  Q~(l  -  fn).  Compute  the 
exceedences  X<  -  Tn  =  X,  -  Q"(l  -  tn )  for  all  X+  >  Tn  =  Q*(l  -  tn)  and  treat 
them  as  a  random  sample  from  a  GEV(p, a). 

4.2  Parameters  Estimated  By  Maximum  Likelihood 

Assuming  the  exceedences  are  a  random  sample  from  a  GEV  distribution, 
the  maximum  likelihood  estimates  of  p  and  a  are  the  values  which  maximize  the 
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log-likelihood 


-[ntnj  In  a 


£ge  \{p,*\Y) 


P<  0, 

a  >  -p-  y([ntn];[ntn]) 


p  —  0,  a  >  0 


■{nfnjlna 


-(H'z-M) 


p  >  0,  a  >  0, 


where  [•]  denotes  the  greatest  integer  operation  and  =  X  -  Q~(l  -  tn)  for  all 
xi  >  Q'i1  -  *n),  with  y([ntn];  [ntn])  =  max{r!, . . . ,  Fjwtn]}. 

Maximum  likelihood  estimation  for  the  GEV  is  often  criticized  because  it 
must  be  performed  numerically.  However,  Hosking  (1985)  provides  an  algorithm 
for  finding  the  GEV  maximum  likelihood  estimates  for  a  random  sample  based 
on  Newton-Raphson  iteration  with  some  modifications  designed  to  improve  the 
rate  of  convergence.  This  algorithm  performs  well  for  \p\  <  j  and  ntn  >  15. 

The  large  sample  properties  of  these  estimators  do  not  follow  from  directly 
from  large  sample  maximum  likelihood  theory  since  the  exceedences  are  not  a 
sample  from  a  GEV  distribution  in  general.  To  derive  the  asymptotic  results, 
first  express  the  estimators  as  solutions  to  a  set  of  estimating  equations,  take  the 


Taylor’s  series  expansion,  and  then  compute  the  asymptotic  distribution  of  each 
term.  This  approach  yields  the  following  result. 
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THEOREM  4.2.1.  Suppose  that  fQ(  1  —  u)  =  u^+1L(u),  where  Po  <  p  < 
1,  p  ^  0,  where  pq  is  the  only  real  root  of  h(po)  on  the  interval  (—1,1)  where 

(4.2.1)  h{p)  =24 p17  -  46 p16  -  345p15  +  520p14  +  1715p13 

-  1490p12  -  3877 pn  +  584p10  +  5729 p9  -  6428 p8 

-  7150p7  +  24532p6  -  2184p5  -  2672p4  -  7072p3 

-  8512p2  -  4224p  -  768. 


Approximately,  po  «  —.356967.  Also,  suppose 


Lit) 

L(tu) 


<  A(u)R(t) 


for  some  positive  measurable  functions  A(u)  and  R(t)  where  limt_,o+  R(t)  =  0. 
Let  {Tn}  be  a  sequence  of  thresholds  defined  on  (Q(0) , Q(l))  such  that  nt*  — ►  oo 
and  t„  — ►  0  as  n  — ►  oo,  where  t„=  1  -  F(Tn).  Further,  let  a  =  a(t*)  be  the 
scalar  value  of  a  function  a(')  satisfying  a(tn)hQ(l—t„)  =  l+0(i?j(t^))  for  some 
positive  measurable  function  R\(t)  with  limt_^j+  Ri(t)  =  0.  Let  (pn,  an)  denote 
the  maximum  likelihood  estimates  from  the  GEV  model  for  the  exceedences.  Then 
conditional  on  Tn, 


Pn 

L  an  J 


/ 

P  ~  4 

+  0(ft-(O) 

\ 

is  AN 

4 p(p  -  2)2 

a+  1 

»  ([ntn|)”lvGEV 

V 

2a(p  +  2) 

7 

as  nt„  -»  oo,  where  R*(t)  =  max{ JZ(t), J?i(t)}  and 

on  V12' 
V21  v22 . 

with  h(p)  given  in  (4.2.1)  and 


vil  =4p4(p  -  2)4(p  -  l)3(p  +  l)2(p  +  2)3(2p  +  l)(3p2  +  4p  +  2)  +  0(f?*(f*)), 
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v22  =2 a2p2(p  -2 )(p  +  l)(p  +  2)3(2p  +  1) 

•  (8pn  -  50p10  +  81p9  +  63 p8  -  374 p7  +  668p0 

-  623p5  +  203p4  -  16p3  -  32p2  +96 p  +  48)  +  0(i2*(^)), 

v12  =v2l 

=4 ap5(p  -  2 )2(p  -  1  )3(p  +  1  )(p  +  2)2(2p  +  1) 

•  (p4  +  6p3  -  21p2  -  48p  +  16)  +  O(B0  (t*)). 


PROOF.  Let  (pn,  an)  denote  the  maximum  likelihood  estimates  derived  as¬ 
suming  the  exceedences  over  the  threshold  Tn  are  a  random  sample  from  a  GEV 
distribution.  Then,  (pn,  dn)  is  the  solution  to 

Ki 

o = (Kir1'*  E 

*=i 


Take  the  first  order  Taylor’s  series  expansion  about  the  true  parameter  values 
(p,  a)  of  the  right  hand  side  to  obtain,  for  some  point  (p*,  a*)  on  the  line  segment 
between  (pn,an)  and  (p,a), 


«=1 


K) 

o  =  (Kir 1/2  E 


d-^GEV (/>><*>  Xj  ~  Tn) 
dp 

d£GEv(P’ai  Xj  ~  jjO 

da 


^^»GEv(Pw,  On?  Xj  ~  Tn) 
dp 

d^GEY (Pn.an?  Xj  -  Tw) 
da 


+(Kir1 


«i 

•E 


i=l 


32£gE  vb‘,a';Xi-Tn) 

dp 2 

^roEVt/.^i^-rn) 

dpda 


^2^GEv(/»a*?^  ~  ?n) 

dpda 

S2CGEV(f,-,n--,Xx-Tn) 

da2 
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•([<])1/2 


Pn~  P 


From  the  Central  Limit  Theorem  for  an  iid  sequence, 


("«»] 

«or1/2  E 

»=l 


as  nt\  —*  oo,  where 


d^GE v(p,a;  Xj  -  Tn) 

dp 

dZGEv{p>*\Xj  ~  Tn) 
da 


is  AN(/.„,(|<1)-1E„) 


<*n  =  E  X-Tn  |  A>T. 


p-4 


4p(p  -  2)2 

1 


2a(p  +  2) 

L 

and 

En  =  Cov  x_Tn  |  x>Tf 


^■^GEvfP^i^t  ~  Tn) 

dp 

d£GEv{P>a>Xj  ~  Tn) 
da 


+  0(*‘O 
+  0(**(O) 


/ 

^^gev(p.“;^»  -  rn) 

5£GEv(P.a;^i  -  Tn) 

\ 

ap 

dp 

^■^GEv(P>ai  Xi  “  T’n) 

d£GEV{p,^X%  ~  Tn) 

V 

5a 

da 

) 

^11  &12 

1^21  <722  J 


with 


<7n  = 


4p4(p  -  2 )3(p  -  l)3(p  +  l)(p  +  2)(2p  +  1)  J 
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a22 


(8pn  -  50p10  +  81p9  +  63 p8  -  374p7  +  668 p6 
-  623p5  +  203p4  -  16p3  -  32 p2  +  96p  +  48)  +  0{R*{t*n )), 

3p2  +  4p  +  2 


2  fl2p2(p  +  2)(2p  +  1) 

<*12  =<t21 


+  o(jto. 


(p4  4-  6p3  -  21p2  ~  48p  +  16) 
2ap(p  -  2)2(p  +  l)(p  +  2)2(2p  +  1) 


+om?n)). 


This  result  follows  from  the  expression  of  the  gradient  vector  for  the  GEV  log- 
likelihood  which  is  given  in  Appendix  E  and  the  moments  given  in  Theorem 
2.4.1. 

For  n  sufficiently  large  to  make  the  0(f2*(t£))  terms  negligible,  notice  that 
En  is  positive  definite  if  and  only  if  po  <  p  <  1,  where  po  is  the  only  real 
root  of  h(po)  on  the  interval  (—1,1)  for  h(p)  given  in  (4.2.1).  Approximately, 
po  ~  —.356967. 

If  En  is  positive  definite,  then  from  the  Weak  Law  of  Large  Numbers  and 
the  fact  that  (p*,a*)  -£-»  (p,a)  as  nt„  — *•  oo,  it  follows  that 


Kl 

(K))-1  £ 


i  - 1 


a'lcGEvy’,^\Xi-Tn) 

dp 2 

d2£GEV{p\«'\Xi-Tn) 

dpda 


a2£aEv(p',a-;X,-T„) 

dpda 

a2iGEvO>',<-‘;*,--r,.) 

da 2 


1  O' 

.0  1. 


as  — ►  oo.  This  result  follows  from  the  expression  of  the  Hessian  matrix  for 

the  GEV  log-likelihood  which  is  given  in  Appendix  E  and  the  moments  given  in 
Theorem  2.4.1. 

The  asymptotic  distribution  of  (pn,  on)  then  follows  from  Slutsky’s  Theorem, 
observing  that  VqeV  =  □ 
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4.3  Parameters  Estimated  From  The  Sample  Quantile  Process 

Other  parameter  estimates  can  be  obtained  by  applying  the  theory  of  regres¬ 
sion  analysis  on  continuous-parameter  time  series  from  the  reproducing  kernel 
Hilbert  space  (RKHS)  point  of  view  to  the  sample  quantile  process  Q'(u).  The 
justification  for  this  approach  was  given  in  Subsection  3.3,  but  the  main  con¬ 
sequence  to  statistical  inference  is  that  for  a  random  sample  from  a  population 
with  quantile  function  Qo(u)  and  density-quantile  function  /oQo(u), 

V^/oQo(«)  (<?'(“)  “  QoOOj  ~  -B(u) 
where  B(u)  is  a  Brownian  bridge. 

The  GEV  model  for  exceedences  of  a  threshold  assumes  that  for  given  tn, 

Q  X-Tn  I  jr>Q"(l-tn)=r„  (“^n.Tn)  =  —a  •  $(—  In  u;  —p) 

with 

,  _  .  .  u(-lnu)p+1 

fQ  x-rn  |  x>g-(i-tn)=rn  (*;*».  rn)  = - - - . 

The  sample  quantile  process  for  the  exceedences  for  given  tn  is 

Q~ X-Tn  I  X>Q’(\-tn)=Tn  (u;tn,r„)  =  Q"(l  -  tn(l  -  «))  -  Q'(l  -  tn)- 

Therefore,  estimating  p  and  o  becomes  a  problem  in  regression  analysis  of 
continuous-parameter  time  series  by  writing 

u(  lnu)^+1[Q“(l  -  tn(l  -  u))  -  Q'(l  -  tn)) 

=  a  u( —  In  u)p+1y(-  In  u; -p)  +  ogB(u) 

where  ag  =  a/  y/[ntn]  is  treated  as  a  free  parameter  not  constrained  to  be  related 
to  a. 

Estimators  can  be  formed  from  this  time  series  regression  using  the  repro¬ 
ducing  kernel  inner  product  corresponding  to  the  Brownian  bridge  covariance 
kernel  Kg(s,t).  This  RKHS  consists  of  Li  differentiable  functions  /,  g  on  the 
interval  p  <  u  <  q  with  inner  product 

(f,9)p,q  =  j  /'(tt)s'(u)du  +  ^ /(p)s(p )  + 
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Since  this  model  is  not  linear  in  the  parameters,  no  closed  form  expressions 
exist  for  the  estimators.  However,  drawing  from  the  applications  paradigm  of 
the  Box-Cox  (1964)  transformation  in  regression,  estimates  of  (p,  a)  are  found 
from  Q~(')  by: 

1.  Choose  a  reasonable  range  of  values  for  p. 

2.  For  each  value  of  pM  the  model  is  linear  in  a,  so  compute 


(u(-lnu)^+1ff(-lnu;-p),tt(-  lnu)p+1[Q"(l  -tn(l  -  «))  ~  Q'U  ~*n)]) 

.  j _ _____ _ '  p,q 

/«(-  lnu)^+1^(-  lnu;  -p),u(-  lnu)^+1y(-  lnu;  -p)) 

\  /  p,q 

and  compute  the  value  of  a  specified  loss  function  iZ(p(*J,ap^). 

3.  Choose  the  estimate  of  (p,a)  which  minimizes  i2(>). 
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5.  PARAMETER  ESTIMATES  FROM  MODELS  BASED  ON 
REGULARLY  VARYING  EXPRESSIONS  FOR  THE  TAILS 

5.1  Parametric  Modeling  For  The  Tail 

A  third  approach  to  formulating  tail  estimates  is  to  derive  parametric  models 
for  the  tails  of  the  quantile  function,  distribution  function,  and  density  function 
from  fQ(l  —  u)  =  up+lL(u).  Assuming  these  models  hold  beyond  a  given  thresh¬ 
old,  parameter  estimates  are  derived  from  the  largest  order  statistics. 

The  results  of  the  following  theorem  motivate  the  probability  models  for  the 
tails. 

THEOREM  5.1.1.  Suppose  that  fQ{  1  —  u)  =  u^+1L(u),  where  p  €  IR  and 
L(u)  is  slowly  varying  as  u  — ♦  0+.  Then 

(a)  the  quantile  function  can  be  represented  as 

(i)  Q(1  -  u)  ~  Q(l)  4-  u~p  •  X-q(u)  as  u  -*  0+  for  p  <  0; 

(ii)  Q(1  -  u)  ~  u~p  •  L*(u)  as  u  — ►  0+  for  p  >  0; 

(b)  the  distribution  function  can  be  represented  as 

(i)  F[Q(1)  -  x]  ~  1  -  x~l/p  •  L*(x)  as  x  —*  0+  for  p  <  0  assuming 
L(u[— pL(u)]1/p)/L(u)  — *■  1  locally  uniformly  in  p  <0  as  u  —*  0+; 

(ii)  F(x)  ~  1  —  x~l/p  •  L^x)  as  x  —*  00  for  p  >  0  assuming 

L(u) / L(u[pL{ — ►  1  locally  uniformly  in  p  >  0  as  u  —>  0+; 

(c)  the  density  function  can  be  represented  as 

(i)  /1Q(1)  -  x]  ~  (-l/pji-d/rt-1  •  L\(x)  as  x  — *  0+  for  p  <  0  assuming 
L{u[— pL[u)\^l p) I L[u)  — ►  1  locally  uniformly  in  p  <  0  as  u  — ►  0+  and  f 
is  ultimately  monotone; 

(ii)  f{x)  ~  (1  /p)x~Wp)~l  ■  L$(z)  as  x  — ♦  00  for  p  >  0  assuming 
L(u) / L(u[pL(u)]l/p)  — ♦  1  locally  uniformly  in  p  >  0  as  u  — ♦  0+  and 
f  is  ultimately  monotone; 

where  Lq(u)  =  l/(pL(u)],  which  is  slowly  varying  as  u  — ►  0+,  Lj(x)  = 
[— pL{x~^lp)]~^lp ,  which  is  slowly  varying  os  x  — ♦  0+,  and  L^x)  = 
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[/>L(z~l/p)\  *!p  which 


is  slowly  varying  as  x  oo. 


Tables  3-5  contain,  respectively,  examples  of  the  quantile  function,  distri¬ 
bution  function,  and  density  function  for  some  common  parametric  probability 
models  derived  from  the  tail  behavior  model  and  expressed  in  the  form  given  in 
Theorem  5.1.1. 

Taking  L(u)  =  l/A,  A  >  0  for  u  >  t  for  some  threshold  percentile  t  suggests 
the  tail  parameterizations 


<?(•) 

F{x) 


'Q(l)  -(l-u)-^-(A/(-p)),  p<  0 

.  (1  -  u)~p-  A/p,  p>  0 

'l-[Q(l)-x}-i/p.(-p/A)-l/‘>,  p<  0 

1  _  a-- 1/p  .  (p/A)-1/P,  p  >  0 


/(*) 


(-l/p)[Q(l)  -  x]-(Vp)-1  .  (— p/ A)-1/p,  p  <  0 
(1  /p)x-Wp)~l  •  (p/ A)-1/',  p  >  0 


Notice  that  this  parameterization  requires  that  the  class  of  tail  behavior  is 
known.  Each  case  will  be  treated  separately  in  the  following  subsections. 

Note  that  determining  the  class  of  tail  behavior  for  the  underlying  distribu¬ 
tion  is  equivalent  to  determining  the  domain  of  attraction  of  the  extreme  value 
distribution  since 

(i)  p  <  0  o  Domain  of  Attraction  is  the  Type  III  Extreme  Value  Distribution; 
(**)  p  >  0  o  Domain  of  Attraction  is  the  Type  II  Extreme  Value  Distribution. 
Therefore,  one  method  for  determining  the  class  of  tail  behavior  is  to  evaluate 
the  probability  modeling  assumptions  and  determine  the  domain  of  attraction. 

A  diagnostic  derived  from  the  sample  which  can  be  used  to  determine  the 
class  of  tail  behavior  is  the  Identification  Quantile  (IQ)  Box  Plot  defined  by 
Parzen  (1983).  For  an  arbitrary  random  variable  V,  define  the  identification 
quantile  standardized  random  variable  ZQI  =  (Y  -  Q(.b))/o,  where  Q(.5)  is  the 
median  and  o  =  2(Q(.75)  -<3(.25)]  is  the  quartile  deviation.  Appendix  A  contains 
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Table  3 


Quantile  functions  of  common  parametric  probability  models  expressed  in  the 
representation  given  in  Theorem  5.1.1(a)  based  on  the  tail  behavior  model 
/Q(l-u)  =  u<>+lL{u). 


Distribution 

Quantile  Function 

Uniform 

<5(1  -  u)  =  .5  -  u 

=  <5(1)  +  u  •  (-1) 

Neg.  Exponential 

<5(1  -  u)  =  [In 2/(2 In 3))  +  [ln(l  -  u)/(2  In 3)] 

~  <5(1)  +  «•  (—1/(2 ln3))[l  +  .5uj  asu-0+ 

Neg.  Weibull(p) 

<5(1  -  n)  =  ([In 2} l/p/o)  -  ([-  ln(l  -  u)\l/*fo) 

~  Q(l)  +  u1/p  •  {—\/a)\\  +  .5(3  -  p-1)u) 

as  u  — ♦  0+ 

Cauchy 

<5(1  -u)  =  Jtanx(J  -  u) 

~  u"1  •  (l/4x)[l  -  (ir2/3)u2]  as  u  -+  0+ 

Pareto  (p) 

<5(1  -  u)  =  u~l/P .  (l/a)[l  -  2 ‘/PttVP] 

Fr6chet(p) 

0(1  -  <0  =  ([-  Ml  -  -  (|to 2\-'/”lo) 

~  u-1/p  •  (l/u)[l  +  .5(3  4-  p-1)u]  as  u  -+  0+ 

Note:  See  Appendix  A  for  the  definition  of  a,  a  different  scale  constant  for  each 
different  distribution. 
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Table  4 

Distribution  functions  of  common  parametric  probability  models  expressed  in  the 
representation  given  in  Theorem  5.1.1(b)  based  on  the  tail  behavior  model  fQ(  1  — 
u)  =  up+1L(u). 

Distribution  Distribution  Function 


Uniform 


F(Q(l)  -  x)  =  1  -  x  •  7(0  <  i  <  1) 


Neg.  Exponential  F(Q(l)  -  i)  = 


x  <  0 


( e-(2ln3)x  x>0 

~  1  —  x  •  (2  In  3) [1  —  (In  3)x]  as  x  — *  0+ 


Neg.  Weibull(p)  F(Q(  1)  -  x)  = 


(1,  x  <  0 

x  >  0 


~  1  -  xp  •  op[l  +  .5(1  -  3 p)x* 


Cauchy 


Pareto(p) 


F(x)  =  .5  +  (l/x)  tan  *4x 

~  1  -  x-1  •  (l/4?r)[l  -  (l/48)x-2]  as  x  -*  oo 


-c 


(0,  x  <  [1  -  21/p]/o 

1  1  -  [21/?  +  ax]-P,  x  >  [1  -  21/P]/o 
1  -  x-p  •  <7-p[l  —  (p21/p/<r)x-1] 


as  x  — ►  oo 


Fr4chet(p) 


F(x)  = 


x  <  —(In 2) 

exp{-[(ln  2)-1/p  +  oxj-p}, 

x  >  — (ln2)-1/p/(7 


~  1  -  x  P-o  p[l  -  .5(1  +  3p)x  p]  as  x  — ►  oo 

Note:  See  Appendix  A  for  the  definition  of  a  different  scale  constant  for  each 
different  distribution. 
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Table  5 

Probability  density  functions  of  common  parametric  probability  models  expressed 
in  the  representation  given  in  Theorem  5.1.1(c)  based  on  the  tail  behavior  model 
fQ(  1  -  u)  =  u p+1L(u). 

Distribution  Density  Function 


Uniform 


f(Q{  1)  —  z)  =  x°  •  1(0  <  x  <  1) 


Neg.  Exponential  /(Q(l)  -  x)  = 


(2ln3)e"(2ln3)1,  z>0 


otherwise 


Neg.  Weibull(p)  f(Q(  1)  -  x)  = 


x°  •  (2  ln3)[l  —  (2  ln3)x]  as  i  -+  0+ 

f  papxp-1e-(DQ  2)P,  z  >  0 
1 0,  otherwise 

pxp_1  ■  <7p[l  +  .5(1  -  3p)xp] 


Cauchy 


Pareto  (p) 


/( X)  =  (4/»)  ■  1/(1  +  16l2) 

~  x-2  •  (l/4x)(l  —  (l/16)x~2]  as  z  — ►  oo 


/(*)  = 


|  0,  x  <  (1  -  2 lfP)/a 

1  <rp[21/p  +  <7z]-p_1,  z  >  (1  —  21/p)/o 
px~p_1  •  o-p(l  -  (21/p(p  +  1  )/o)x~1} 


as  z  — *  oo 


z  <  -(ln2)  1//p/<7 

Fr4chet(p)  f(x)=<  ap[(ln2)“l/p  +  <rxj“p~1 

•  exp{-[(In  2)_1/p  +  oz]_p}, 

z  > -(ln2)-1/p/a 

~  px~p-1  •  <7-p[l  +  .5(1  +  3p)x-p]  as  z  — ♦  oo 

Note:  See  Appendix  A  for  the  definition  of  <r,  a  different  scale  constant  for  each 
different  distribution. 
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many  common  parametric  models  expressed  in  the  identification  standardized 
version. 

The  standardized  ZQI  has  identification  quantile  function  QI(u)  =  (Q(u)  - 
Q(.5))/o.  At  u  =  .5,  QI(u)  is  equal  to  zero  and  has  slope  approximately  one. 
This  approximately  tangent  line  is  a  basis  for  comparing  the  tail  behavior  of 
different  distributions  as  u  — »  0+  or  u  — ►  1~.  Figure  4  shows  an  overlay  of  the 
identification  quantile  functions  for  the  Uniform  distribution  ( p  <  0),  Normal 
distribution,  and  Cauchy  distribution  (p  >  0),  each  of  which  is  discussed  in 
Appendix  A.  Notice  from  the  figure  that  the  three  types  of  tail  behavior  are 
clearly  differentiated  as  u  ->  1_. 


FIG.  4.  Identification  Quantile  Plots  for  the  Uniform  (p  <  0),  Normal,  and 
Cauchy  (p  >  0)  Distributions  clearly  differentiating  the  types  of  tail  behavior. 


Therefore,  a  useful  diagnostic  for  tail  behavior  is  an  estimate  of  the  identi¬ 
fication  quantile  function.  Let 

oru)  = 

V  [>  2(Q%75)  — 

and  display  this  function  graphically  in  the  IQ  box  plot.  The  IQ  box  plot  is  a 
graph  of  Q/~(u)  for  0  <  u  <  1  with  informative  overlays  to  help  in  using  the  plot 
as  a  diagnostic  for  classifying  tail  behavior. 

The  first  of  these  overlays  is  used  to  indicate  short  tail  behavior  or  equiva¬ 
lently  p  <  0.  From  the  plots  of  the  quantile  functions  of  short  tailed  distributions 
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given  in  Appendix  A,  it  is  seen  that,  as  a  group,  these  quantile  functions  hardly 
deviate  from  the  approximately  tangent  line  at  u  =  .5  with  intercept  zero  and 
slope  one  (which  is  the  identification  quantile  function  of  the  Uniform  distribu¬ 
tion).  Drawing  this  tangent  line  in  the  IQ  box  plot  permits  a  visual  diagnostic 
of  short  tail  behavior. 

Notice  that  the  identification  quantile  of  the  Normal  distribution  drawn  in 
Figure  4  is  nearly  equal  to  one  at  u  =  1.  It  is  useful  to  truncate  the  plots  on 
— 1  <  QF( u)  <  1  to  allow  comparison  to  the  Normal  since  it  is  an  important 
special  case  in  the  family  of  parametric  models  and  assumptions  of  normality  are 
often  made. 

Long  tail  behavior  or  equivalently  p  >  0  is  indicated  when  the  sample  iden¬ 
tification  quantile  function  exceeds  the  truncation  line  for  values  of  u  less  than 
one. 

5.2  Estimation  Assuming  The  Class  Of  Tail  Behavior  Is  Short  Tailed 

To  obtain  tail  estimates  when  the  tail  behavior  is  known  to  be  short  tailed, 
choose,  as  a  function  of  n,  a  threshold  percentile  tn.  Then  let  the  threshold  value 
be  given  by  Tn  =  Q~(l  —  tn).  Parameter  estimates  are  obtained  by  assuming  that 
the  underlying  distribution  satisfies 

(5.2.1)  Q( u)  =  Q(l)  -  (1  -  u)~p  •  (A/(-p))  for  tn  <  u  <  1 

(5.2.2)  F(x)  =  1  -  |0(1)  -  x\-'l '  ■  (-pi A)-11"  for  0'(l  -(„)<*<  0(1) 

(5.2.3)  I(x)  =  (-1/,>)[0(1)  -  '  (-P/A)-Ilf 

for  Q"(l  -tn)<*<  Q(l) 

where  p  <  0,  Q(l)  G  1R,  and  A  >  0  are  unknown  parameter  values. 

Hall  (1982)  derives  parameter  estimates  based  on  the  largest  order  statistics. 
Stated  in  terms  of  the  exceedence  over  threshold  approach,  the  (ntn]  +  1  largest 
order  statistics  are  given  by 

Q’(l  -  tn)  =  Tn  =  X(n  -  [ntn];  n)  <  X(n  -  [ntn]  +  1; n)  <  •  •  •  <  X(n;  n). 

If  the  underlying  distribution  satisfies  (5.2.1-5.2.3),  the  log-likelihood  of  the 
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[ntn]  +  1  largest  order  statistics  is  given  by 
£st(p,Q(1),A;X) 

-I  ni 

n  (n  -  [nt„]  -  1)! 

-  ([n*n]  +  1)  Q  +  1^  ln(-p)  +  ([n<„]  +  1)^  In  A 

.  [ni„]+l 

-  +  lj  ln[Q(l)  -  X(n-t  +  1;»)] 

p  '  »=i 

+  (n  -  [nt„]  -  1)  In 

where  X(n;n)  <  Q(  1)  <  oo,  p  <  0,  and  A  >  0. 

Solving  Q(l)>  A;  X)/dA  =  0  expresses  the  estimate  A  given  the 

parameter  estimates  p  and  gA(l): 

A  =  -p[t»  +  (l/n)HQ*(l)  -  X{n  -  (ntn);n)] 

(5.2.4)  -  -Ptim  1)  -  X{n  -  [«*»];  n)J. 

Hall  (1982)  uses  this  expression  to  reduce  the  three  parameter  log-likelihood 
Cst  to  a  function  of  only  p  and  Q(l)  by  defining,  for  X(n;n)  <  Q(l)  <  oo, 


1  - 


-^[Q(l)-X(n-[ntn];n)] 


-1  /p' 


£‘st(pM  1);X) 

=£st(/>»  <?(!)>  A;  X) 

-  In  7 - rTT^n^ntnl+1(1  -  ^)n',nt’*1"1 

(n  -  [nt»]  -  1)1 

-  ([ntn]  +  1)  ln(-p)  +  ([ntn]  +  l)-  ln[Q(l)  -  X{n  -  [ntn];n)[ 
~\p+l)  ^  ln|Q(l)-X(n -»  +  !;»)], 


which  is  the  same  function  (up  to  terms  not  involving  p  and  g(l))  maximized 
by  Smith  and  Weissman  (1985)  to  obtain  parameter  estimates. 

If  p  <  -1,  the  function  £gT(-)  h43  no  max‘mum  since  for  any  p  < 
-1,  limg(1)_^(n;n)+ £gT(p,  Q(1);X)  =  oo.  Therefore,  Hall’s  estimates  based 
on  the  largest  order  statistics  from  a  short  tailed  probability  model  are  denoted 
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by  (p,<?A(l),  A),  where  (p,<2A(l))  is  the  maximum  of  ■*)  in  the  con* 

strained  parameter  space  {—1  <  p  <  0,  A^njn)  <  Q(l)  <  oo},  and  A  is  given 
by  (5.2.4). 

To  derive  the  large  sample  properties  of  Hall’s  estimators  for  the  short  tailed 
case  based  on  the  largest  order  statistics,  first  express  Hall’s  estimators  for  p 
and  Q(l)  as  solutions  to  a  set  of  estimating  equations,  take  the  Taylor’s  series 
expansion,  and  then  compute  the  asymptotic  distribution  of  each  term.  The 
distribution  of  A  then  follows  since  it  is  a  function  of  p  and  Q‘(l).  This  approach 
yields  the  following  result. 


THEOREM  5.2.1.  Suppose  that  fQ(  1  —  u)  =  u/H‘1L(u),  where  — j  <  p  <  0 


and 


L(t) 


L(tu) 


-  1 


<  A(u)R(t) 


for  some  positive  measurable  functions  A(u)  and  R(t)  where  limt_  0+  R(t)  —  0. 
Let  {Tn}  be  a  sequence  of  thresholds  defined  on  (Q(0),Q(l))  such  that  nt*  — ►  oo 
and  t*  -♦  0  as  n  -*  oo,  where  t*n  =  1  -  F(Tn).  Further,  suppose  -p|Q(l)  - 
Tn]hQ(l  - 1*)  =  1  +  0(Ri(t„))  for  some  positive  measurable  function  Ri(t)  with 
limt_o+  R\[t)  =  0.  Let  (pn,QrT(l)i  An)  denote  Hall’s  estimates  based  on  the 
largest  order  statistics  from  a  short  tailed  probability  model.  Then  conditional  on 

Tn, 


Pn 

/ 

,+o(K*(iy) 

\ 

is  AN 

Q(l)+0[R'(O) 

.  (KI)-‘vSt 

L  J 

V 

.-/>«  W(i)-r»]  +  0(s*(O). 

/ 

as  ntn 


oo,  where  R*(t)  =  max{R(t),Ri(t)}  and 


'”n 

”12 

viz' 

ysx  = 

”21 

V22 

”23 

.”31 

”32 

”33. 

with 


”11  =  (p  +  l)2  +  0(J2*(t^)), 
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m  =  (i  +  lj  (2f+  1)IQ(1)  -  r„f  +  o(x"(i‘))t 

v33  =  Ap  +  -  U2Kri(P  +  l)(lni;)J  -  2Int;  +  2]  +  0(JR*(»*)), 

*>12  =  v2i 

=  -  Q  + 1)  (2p  +  1)I«(1)  -  r„]  +  o(«'(t*)). 

t»i3  =  V31 

=  p(p  +  1)[Q(1)  -  2Wl(C)'|l  -  (p  + 1)  in t;i  +  o(R*(t'n)), 

*>23  =  *>32 

=  (p  +  1)(2 P  +  1)[<3(1)  -  rn]2(t^(lnt;  -  1)  +  0(H*(0). 


PROOF.  Attention  will  first  be  paid  to  the  pair  (p,  Q‘(l))  since  A  is  a 

function  of  these  two  parameters.  Write 

dCST(p>Q(  1);X) 

dp 

and 

dC’sr{e,Q(iy,X) 

3QW 

as  sums  of  the  independent  exceedences  X,  —  Tn  given  Xx  >  Tn.  That  is, 

ac‘sr(/>,Q(i);X)  [ni;|  +  i  1  f,  x(n-.  +  i;n)-r„i 

dp  ~  *  m  | 1  Q(\ )-T- 


»=1 


Q(l)  -  Tn 


1  =  1 


fjsifogWj*)  K1 

3<3(1)  />(«(!)  -  Tn  1 


.  .  Kl 
\  1  Y" 

X(n  -  i  +  1;  n)  -  Tn 

'  Q(  1)  -  Tn 

[  Q(l)  -  Tn  J 

l<l  j-  j 

=  ^  MW)  -  r„] 


-1 
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V/>  /  0(i) -r„  [  o(i) -r„J  / 


=  ^*(o,  0(1);  Jfi-r.) 

t  =  l 

Therefore,  Hall’s  estimates  (pn,  Qn*(l))  can  be  expressed  as  the  solution  to  the 
estimating  equation 

K 1  [0l(Pn,Qn‘(l);Xt-Tn) 

o = (ior1/2  E 

.=i  -  r„) 

Take  the  first  order  Taylor’s  series  expansion  about  the  true  parameter  values 
(p,  Q(l))  of  the  right  hand  side  to  obtain,  for  some  point  (p*,Q*(l))  on  the  line 
segment  between  (p„,Qn‘(l))  and  (p,Q(l)), 

«]  L(p,Q(l);*t-T„) 

o  =(Kl)-‘/2  E 

•=i  -r„) 


(<1  dp  3Q{  1) 

+ (Mir1  E 

i-i  Wz(i»‘.0,(l);^-rn)  3^(p'.0‘(l);^-r„) 

>0(1) 


•  (K!),/2 


Pn  ~  P 

«n‘(l)  -  0(1) 


From  the  Central  Limit  Theorem  for  an  lid  sequence, 


I<1  tMp.0(l);Xi-r„) 

(K1)'1/2E  “  AN(M„,((nt;i)-‘E„) 

tM/>,0(i);X,-r„) 
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as  nt*  — ►  oo,  where 


Vn  =  E  X-Tn  I  X>Tn 


Mf>MiY,Xi-Tn) 

Mf>Mi)\Xi-Tn) 


■(P[nt*n])-'  +  0(R*(t*n)) 

o(fi*K)) 


■o(**(C))' 

O(X'K)). 


and 

En  =  Cov  x-Tn  |  X>Tn 


/ 

Mr,QW'Xi-Tn) 

\ 

V 

Me.QW.Xi-Tn) 

•  m 

i 

MP.QW'.Xi-Tn) 

7 

>11  <r12l 

,a2i  a22  J 

with 

"“=Mi-i4)+0(J!'(,;)) 

~±  +  o(r\Q), 

p 

°2S=(^  +  ^)  WW-M-’  +  oi™ 

~  -  r„r2  +  0(«*(«;)), 

0\2  =  ^21 

=  ^(7TI)IQ(1)  ■  rn|’1  + 

This  result  follows  from  the  expressions  for  the  moments  given  in  Theorem  2.4.1. 

For  n  sufficiently  large  to  make  the  0(R*(t^))  terms  negligible,  notice  that 
En  is  positive  definite  if  and  only  if  -  j  <  p  <  0. 


53 


If  En  is  positive  definite,  then  from  the  Weak  Law  of  Large  Numbers  and 
the  fact  that  (p*,Q*(l))  (p, Q(l))  as  nt*  — *•  oo,  it  follows  that 


(Kir1 


d^{p\Q*(iytXi-Tn) 

dp 

dMp*,Q*iiyxj-Tn) 

dp 


drl,x(p\Q*(iy,Xx-Tn) 

dQ{  1) 

d^p\Q*{l)\Xx-Tn) 
dQ(  1) 


0 

1 


as  nt\  — ►  oo.  This  result  follows  from  the  expressions  of  the  derivatives  for  the 
estimating  equations  given  in  Appendix  F  and  the  moments  given  in  Theorem 
2.4.1. 

The  asymptotic  distribution  of  (pn,  Qn*(l))  then  follows  from  Slutsky’s  The¬ 
orem.  Since  An  is  a  function  of  (pn>Q»(l))>  the  result  of  the  theorem  follows. 
□ 


It  is  not  immediately  obvious  that  the  tail  estimates  motivated  from  the 
parameterization  in  this  section  fit  into  that  given  in  (2. 3.5-2. 3. 7).  However, 
notice  that  for  1  —  t*  <  u  <  l, 

<?-(«)  =<?-(  1)  -  (1  -  U)-'  •  (A/(-p)) 

=«■(!)  -  (^)  V(i)-r„| 

=r„-(<2‘(i)-r„)[(^)  P  - 1] 

=r„  +  (-p(0-(l)  -  r„|)  ■  [-j  -fj 

and  for  Tn  <  x  <  Q(l), 

r(x)  -  [<?•(!)  - 1|-‘«  ■  (-p  iLr'i'" 
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=1  -  [«-(!)  -  x|-»/f 


~P 


=1  -  c 


g*(i)  -i  l 


[-P(t'n)pmi)  -Tn)\ 

-1/p 


-1/p 


and 


n  LQ‘(1)  -  r„  j 

r  X  —  Tn  l-1/^ 

=1  _  1 - ? _ il L_ 

nL  Q*(l)  -T„J 

1  (p|<?'(l)  -  Tn]**  ”  T"^  ”#)  ’ 


/*(*)  =  (-4)  wd)  -  •  (-p/a  >-*/' 

=  (-4)  («•(!)  - 

=‘“  (  -PIQ*(I)  -  r»! )  1  ~  Q'(l)  -  T„ 


-\/-p 


-p(<;)'W(i)  -r„lJ 
x-t„  l-W')-1 


‘“(-W(i)-r„|)'(s  1)1  G(<?-(i)  -r„|(l  r");  ")■ 

Therefore,  the  tail  estimates  based  on  the  parameter  estimates  from  this  param¬ 
eterization  of  the  tail  follow  under  the  unified  approach  proposed  in  Subsection 
2.3  where  a  =  — /5[Q*(l)  —  Tnj.  For  comparison  with  other  parameter  estimates, 
the  following  corollary  is  easily  shown. 


COROLLARY  5.2.1.  Suppose  the  assumptions  of  Theorem  5.2.1  hold.  Then 
conditional  on  Tn, 


A  1 

Pn 

is  AN( 

L°n  J 

V 

P  +  0(X-(fJ) 


p  +  o(«*(C, 

Lo  +  o(«*(0)J 

(Kir1**) 


as  nt„  — ►  oo,  where 


VS*T  = 


uii  vh 


Lv21  v22  J 
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with 

»;,=<«+ 1)2 +o(*x)). 

»2*2  =2p\„  +  l)W(l)  -  r„|2  +  0(R'{0) 

=2a2(p  +  l)  +  0(R’(fn)), 

V12  =v2l 

=/>(/>  +  i)W(i)  -  r»|  +  o(**(t-» 

=  -a(p  +  l)  +  0(fi,(O)- 

5.3  Estimation  Assuming  The  Class  Of  Tail  Behavior  Is  Long  Tailed 

To  obtain  tail  estimates  when  the  tail  behavior  is  known  to  be  long  tailed, 
choose,  as  a  function  of  n,  a  threshold  percentile  tn.  Then  let  the  threshold  value 
be  given  by  Tn  =  Q‘(l  —  tn).  Parameter  estimates  are  obtained  by  assuming  that 
the  underlying  distribution  satisfies 

(5.3.1)  Q(u)  =  (1  -  u)-P  ■  A/p ,  1  -  tn  <  u  <  1, 

(5.3.2)  F{x)  =  1  -  x~l!p  •  (p/A)"1/',  x  >  Q-(l  -  t„), 

(5.3.3)  f(x)  =  (l/pjx-t1^'1  •  (p/ A)~l/p,  x  >  Q-{  1  -  tH), 

where  p  >  0  and  A  >  0  are  unknown  parameter  values. 

The  most  popular  estimate  of  the  tail  exponent  p  was  proposed  by  Hill 
(1975),  who  derived  parameter  estimates  based  on  the  largest  order  statistics. 
Stated  in  terms  of  the  exceedence  over  threshold  approach  the  [ntn]  +  1  largest 
order  statistics  are  given  by 

Q”(l  -  tn)  =  Tn  =  X(n  -  [nt„];  n)  <  X(n  -  \ntn]  +  1;  n)  <  •  ■  •  <  X(n;  n). 

If  the  underlying  distribution  satisfies  (5.3. 1-5. 3. 3),  the  log-likelihood  of  the 
[ntn]  +  1  largest  order  statistics  is  given  by 

n! 

(n  -  [nfnJ  -  1)’ 


£LT(p,A;X)=ln 
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-  ((ntn]  +  1)  Q  +  lnp  +  (|»t*«j  +  1)“  In  A 

/,  V  M  +  1 

-  y-  +  1 J  In  -^(n  -  *  +  1;  n) 

+  (n  -  (ntn)  -  1)  In  ^1  -  [^AT(n  -  [nfn];  n)]  ^ 
where  p  >  0,  and  A  >  0. 

Solving  3^lt(p»  A;X)/dA  =  0  expresses  the  estimate  A  given  the  parame¬ 
ter  estimate  p: 

A  =  p[tn  +  (l/n)]^AT(n  -  [ntn];n) 

(5.3.4)  '•*’  ptnX{n  —  [n.tn|;  n). 

As  in  the  maximization  of  the  log-likelihood  for  the  order  statistics  from  a 
short  tailed  distribution,  the  two  parameter  log-likelihood  £lT  “  reduced  to  a 
function  of  only  p  by  defining 


CIt(p;X)  =£LT(p,A;X) 

~ ,n  7 - T7T-Ti7‘” ‘■l+‘(1 " 

(n  -  [ntn j  -  1)! 

-  ([nt„j  +  1)  In  p  +  ([nt„]  +  l)  -  In  Tn 

P 

/ 1  \ 

+  ^2  In X(n  -  i  -I- 1;»). 

Therefore,  Hill’s  estimate  for  the  tail  exponent  p  is  found  by  solving 
d££f(p;  X)/dp  =  0  to  obtain 


1 


P  = 


(ntn)  +  1 
1 

[ntnj  +  1 

(nt, 


(ntni  +  1 

y:  \nX(n  -  i+  l;n)  -  lnX(n  -  (ntn];n) 
*= 1 

y:  In  X(n  -  i  +  1;  n)  -  In  X(n  -  [ntn];  n) 
i=l 


(5.3.5) 


X(n  - 1  +  l;n) 
(n<nj  **  L*(n  “  (nfnhn) 


1  l  r»l 

riE'» 


J 
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The  asymptotic  distribution  of  Hill’s  estimate  has  been  considered  by  many 
authors.  For  example,  Mason  (1982)  gives  necessary  and  sufficient  conditions  for 
Hill’s  estimate  to  converge  almost  surely  or  in  probability  to  a  constant;  Davis 
and  Resnick  (1984)  motivate  Hill’s  estimator  from  extreme  value  theory  and 
derive  its  asymptotic  normality. 

The  following  theorem  on  the  asymptotic  normality  of  p  is  similar  to  one 
given  by  Goldie  and  Smith  (1987),  who  also  place  rates  of  convergence  on  the 
slowly  varying  function  and  derive  the  asymptotic  normality  of  Hill’s  estimate. 
The  key  difference  in  the  theorem  given  here  and  Goldie  and  Smith’s  result 
is  that  the  bias  due  to  the  threshold  value  is  expressed  in  terms  of  rates  of 
convergence.  Also  different  from  Goldie  and  Smith  is  the  proof,  which  uses 
(2.2.1),  the  representation  for  the  quantile  function  of  the  exceedences. 


THEOREM  5.3. 1.  Suppose  that  fQ(  1  -  u)  =  up+1L(u),  where  p  >  0  and 


m 

L(fu) 


< 


for  some  positive  measurable  functions  A(u)  and  R{t)  where  limt_0+  R(t)  =  0. 
Let  {!Tn}  be  a  sequence  of  thresholds  defined  on  (Q(0),  Q(l))  such  that  nt £  — *•  oo 
and  t\  -*  0  as  n  — ►  oo,  where  t„  =  1  —  F(Tn)-  Further,  suppose  pTnhQ(l  -  f* )  = 
1  +  0(i?i(t^))  for  some  positive  measurable  function  Ri(t)  with  limt_^)+  Ri{t)  = 
0.  Let  (pn,  An)  denote  Hill’s  estimates  based  on  the  largest  order  statistics  from 
a  long  tailed  probability  model  given  in  (5.3.5)  and  (5.3.4).  Then  conditional  on 


Tn, 


Pn 


L^nJ 


»«  AN 


L  fKYTn  +  o(R‘(fn))\ 


((nCl)  '^LT 


M  nt; 


oo,  where  R*(t )  =  max{R(t), Ri(t)}  an^ 

”11  ”12 


VTT  = 


”21  ”22 J 


with 


vn  =  P2  +  0(R’(t‘n)) 
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«22 = Atf'ii’  i"<; + ifri + o(B*(t’M 
v\2  =  «21 

=  (.2((;)'Uini;  +  i)T„  +  o(fi,(<;)) 


PROOF.  Attention  will  first  be  paid  to  the  distribution  of  p  since  A  is  a 
function  of  p.  First,  write  p  as  the  sum  of  the  independent  exceedences  Xx  -  Tn 
given  Xi>Tn , 


«1 

P  =([n<nl)  1  £  ln 


t=l 


X(n  —  i  +  l;n) 
X(n  —  [nt*];n) 


Xj-Tn\ 
Tn  ) 


From  the  Central  Limit  Theorem  for  an  iid  sequence, 
InCl 


(ln*n])  1/2  ln  (l  +  1<7^’ 


as  nt*  — *■  oo,  where 

Wn  =  E  X-Tn  |  X>r„  ln  +  X'T^n) 

=  />  +  o(fl*(<;» 

and 

a\  =  v«  Jf-T„  |  X>  T„  ln  f '  + 

=  p2  +  0(R*(«i)). 


This  result  follows  from  the  expressions  for  the  moments  given  in  Theorem  2.4.1. 

A 

Since  An  is  a  function  of  pn ,  the  result  of  the  theorem  follows.  □ 


It  is  not  immediately  obvious  that  the  tail  estimates  motivated  from  the 
parameterization  in  this  section  fit  into  that  given  in  (2. 3. 5-2. 3. 7).  However, 


notice  that  for  1  -  t*  <  u  <  1, 


Q‘(u)  =(1  -  u)  *  •  (A /p) 


■(¥) 

=T«  +  (<Sr„)  ■  [-J  (ijp 


and  for  x  >  Tn, 


F~{x)  =1  -  x_1/*  •  (p/ A)'1/' 


=l-x-^ 


=  l-^n,r 


i/£ 


-i  Ip 


=i 


i  + 


X-Tn 


-1/J 


and 


-  0) 

■G) 


8-(V0-l .  {p/A)~1^ 


\  _-(l/p)-l 

1 _ 

1  JU 

lp(t'n)tTn\ 

l-l  Ip 


x-Tnl"^"1 


=<te)[l+Sr] 


Therefore,  the  tail  estimates  based  on  the  parameter  estimates  from  this  parame¬ 
terization  of  the  tail  follow  under  the  unified  approach  proposed  in  Subsection  2.3 
where  a  =  pTn.  For  comparison  with  other  parameter  estimates,  the  following 
corollary  is  easily  shown. 


60 


COROLLARY  5.3.1.  Suppose  the  assumptions  of  Theorem  5.S.1  hold.  Then 
conditional  on  Tn, 


r*.l  /[  P  +  o(R-(i;)) 

it  AN 

,a„J  VUr„+o(fi*(<;)) 


?  +  0(R'(t\)Y 

.a  +  0(ft'(i;)). 


as  nf*  — *■  oo,  where 


■  ,2  + <>(«•(*;))  o2r„  +  o(fl‘(i;))' 

yr,  +  <>(*•(*;))  p2r2  +  o(R’(i;)). 


p2+0(R-(fJ)  ap  +  0(R‘(t‘))- 
.ap  +  0(R‘( $))  a2  +  0(R*(*n)) . 
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6.  DISCUSSION  OF  PARAMETER  ESTIMATE  PROPERTIES 


6.1  Comparison  Of  Parameter  Estimates 

All  comparisons  are  made  assuming  the  sequence  of  threshold  values  {!Tn} 
satisfies  nt*  — ♦  oo  as  n  — ►  oc,  where  =  1  —  F(Tn)-  This  assumption  is 
necessary  for  the  asymptotic  normality  of  the  proposed  parameter  estimates. 
The  tail  exponent  estimators  are  the  focus  of  this  discussion  since  the  estimates 
of  a  show  no  important  differences. 

All  of  the  parameter  estimates  proposed  in  Sections  3-5  are  biased  in  general. 
This  is  due  to  the  difference  between  the  true  value  for  the  tail  and  the  model  for 
the  tail  used  to  obtain  parameter  estimates.  Under  the  conditions  of  Theorems 
3.2.1,  4.2.1,  5.2.1,  5.3.1, 


Bias(pGPD)  =  0(R*(t*)),  -  \  <  p  <  oo,  p  #  0, 

Bias(pGEV)=  --(Pp~-42)2+Q(^X)).  -356967 

Bias(pHau)  =  0(R*(t*)),  <  P  < 

Bias(pniu)  =  0(R*(t*)),  p  >  0, 


PO  <  p  <  1,  p  7*  0, 


as  nt*  — *■  oo. 

Only  the  GEV  estimate  fails  to  be  asymptotically  unbiased  under  the  ad¬ 
ditional  condition  — ►  0  as  n  — >  oo.  Therefore,  the  GEV  estimate  would  be 
inappropriate  for  very  large  samples.  A  comparison  of  the  magnitudes  of  the 
bias  for  small  samples  is  a  topic  for  further  investigation. 

No  global  statements  can  be  made  regarding  the  GPD  estimates,  Hall’s 
estimates,  and  Hill’s  estimates  since  they  are  not  defined  on  a  common  range 
of  support.  Assuming  the  underlying  probability  model  is  short  tailed,  the  GPD 
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estimate  and  Hall’s  estimate  are  asymptotically  equivalent  since 

areWopd^hji)  =  v“ ((S)' 

=  ..  (Kir1  {(p  +  i)a  +  Q(fi‘K))) 

’'-^([»i;])-1{(p+i)2  +  o(R'(i;))} 

=  1,  -\<P<  0. 

Assuming  the  underlying  probability  model  is  long  tailed,  Hill’s  estimate  is 
preferred  since 

ARE(pQpDiPHill)  =  „lim 

»-«  ([-i;|)-1{p2  +  o(*,(<!1))} 


-M)' 


>1,  p  >  0. 

This  gives  an  example  of  the  improved  estimation  possible  through  incorporating 
additional  assumptions  into  the  probability  model.  The  IQ  box  plot  discussed 
in  Subsection  5.1  is  one  data  analytic  tool  for  validating  assumptions  on  tail 
behavior. 


6.2  Interpretation  Of  Parameter  Estimates 

The  majority  of  the  interest  in  tail  behavior  is  in  the  tail  exponent  p  because 
it  describes  important  properties  of  the  underlying  distribution.  For  example, 
many  statistical  estimators  make  the  assumption  that  the  underlying  distribution 
has  finite  variance,  which  corresponds  to  assuming  p  <  j.  Also  of  interest  is 
the  existence  of  a  finite  upper  endpoint  for  the  underlying  distribution,  which 
corresponds  to  assuming  the  short  tailed  class  where  p  <  0.  Davison  (1984) 
proposes  modeling  the  tail  exponent  a a  p  =  0'xj  for  a  vector  of  design  constants 
or  covariates  in  order  to  allow  comparisons  of  the  tail  behavior  for  different 
populations  or  pool  information  from  related  populations. 


The  focus  of  attention  on  the  parameter  estimates  for  interpretation  is  nat¬ 
ural,  but  for  the  most  part  unjustified.  The  confusion  over  parameter  estimate 
interpretation  is  twofold.  First,  the  parameters  defining  tail  behavior  are  non- 
identifiable  and  cannot  be  estimated  without  bias.  Second,  the  true  parameter 
value  may  not  provide  the  best  fit  to  the  observed  tail  over  the  range  of  interest. 

6.2.1  Parameters  Are  Nonidentifiable.  One  important  complication  to  the 
interpretation  of  the  parameter  estimates  is  due  to  the  bias  in  the  estimates. 
This  bias  is  not  due  to  any  failure  of  the  methodology.  Instead,  it  is  due  to  the 
failure  of  a  parametric  model  to  completely  specify  the  underlying  distribution. 

Using  the  estimate  of  p  from  the  GPD  modeling  or  either  Hall’s  estimate 
or  Hill’s  estimate  depending  on  the  appropriate  classification  of  the  underlying 
distribution,  then  asymptotically,  E(p)  =  p  +  0(J?*(t*)).  While  it  is  certainly 
true  that  the  bias  goes  to  zero  as  tn  — »  0+,  the  rate  of  convergence  can  be  very 
slow.  Recall  from  the  rate  of  convergence  comment  in  Subsection  2.2  that  one 
possibility  is  R*(t*)  =  -1/ln  t„,  which  would  require  an  extremely  large  sample 
size  to  obtain  a  negligible  (but  still  present)  bias. 

It  is  popular  to  form  asymptotic  confidence  intervals  for  the  tail  exponent 
estimate  p  using  the  asymptotic  normality  property.  However,  when  estimating 
the  standard  error  of  p,  replacing  the  true  quantities  with  their  estimates  adds  a 
bias  to  the  estimated  standard  errors  since  the  parameter  estimates  are  biased. 
Therefore,  confidence  intervals  for  p  are  invalid  except  for  very  large  sample  sizes 
when  the  bias  is  sufficiently  small. 

6.2.2  The  Case  of  p  =  0.  Many  common  parametric  probability  models  are 
classified  as  medium  tailed.  For  example,  the  Normal,  Exponential,  Weibull, 
Logistic,  Lognormal,  and  Gumbel  (or  type  I  Extreme  Value)  distributions  have 
tail  exponent  p  —  0.  However,  the  case  p  =  0  has  not  been  discussed  in  the 
theoretical  discussion  of  the  proposed  tail  estimates. 

This  is  predominantly  due  to  the  widely  different  types  of  tail  behavior 
displayed  by  probability  models  in  this  class.  For  example,  both  the  Normal  and 


Lognormal  have  p  =  0,  but  from  Figure  5  it  is  clear  that  the  two  distribution 
require  very  different  tail  estimates. 


FIG.  5.  Graph  of  the  quantile  function  for  the  Normal  and  Lognormal  distri¬ 
butions.  Both  these  distributions  have  p  =  0,  but  the  tail  from  .75  to  .99  should 
obviously  be  modeled  differently. 

The  complication  to  the  medium  tailed  class  is  that  ultimately,  the  tail  ex¬ 
ponent  is  zero.  That  is,  far  enough  out  in  the  tail,  all  the  distributions  in  the 
medium  tailed  class  display  the  same  tail  behavior.  In  an  attempt  to  differenti¬ 
ate  between  the  tail  behavior  before  this  ultimate  convergence,  Schuster  (1984) 
proposes  the  following  subclasses: 

(*)  p  =  0,  L{u)  — ►  oo  as  u  — ►  0+  =>  medium-short  tails ; 

(it)  p  =  0,  0  <  limu_o+  £(u)  <  oo  =>  medium-medium  tails ; 

(»’»»)  p  =  0,  L(u)  — ►  0  as  u  — »  0+  =>  medium-long  tails; 

When  the  range  of  interest  is  not  the  extreme  end  of  the  tail,  a  penultimate 
model  may  better  fit  the  observed  tail.  The  ideas  of  ultimate  versus  penultimate 
behavior  have  long  been  recognized  in  extreme  value  theory.  It  is  not  surprising 
that  these  same  issues  occur  in  estimating  tail  behavior  since  the  justification 
for  the  tail  behavior  model  fQ(  1  —  u)  =  up+lL(u)  follows  from  extreme  value 
theory. 

The  penultimate  modeling  of  tail  behavior  is  to  allow  a  nonzero  value  for 


p  which  more  closely  approximates  the  observed  tail  on  the  region  of  interest. 
This  idea  adds  new  problems  to  the  identifiability  of  the  true  tail  exponent  p, 
but  greatly  improves  the  tail  estimates. 
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6.2.3  Summary  of  Parameter  Interpretation.  Because  the  parameters  are 
nonidentifiable  and  a  value  other  than  the  true  tail  exponent  may  fit  better  to 
the  observed  tail,  there  are  serious  complications  in  interpreting  the  parameter 
estimates. 

Therefore,  one  must  choose  which  aspect  of  the  tail  is  of  interest  and  sample 
accordingly.  If  the  tail  exponent  p  is  the  focus  of  the  analysis,  an  extremely  large 
sample  is  required  to  make  the  bias  negligible  and  force  an  extreme  threshold 
value  giving  the  ultimate  convergence  modeling. 

6.3  Effect  Of  Bias  In  Parameter  Estimates  On  Tail  Estimates 

If  the  tail  estimates  are  of  interest,  the  bias  in  the  parameter  estimates  is 
less  likely  to  seriously  alter  the  tail  estimates.  The  estimators  using  the  GPD 
modeling  and  both  Hall’s  estimators  and  Hill’s  estimators  are  an  interesting 
special  case  when  the  bias  is  0(72*(i*)).  The  following  theorem  shows  the  bias 
in  the  tail  estimates  given  in  (2.3.5-2.3.7)  is  of  the  same  rate  in  this  special  case. 


THEOREM  6.3.1.  Suppose  that  fQ(  1-u)  =  u^+1jL(u),  where  p  €  IR,  p  #  0 

and 

<  mm 

for  some  positive  measurable  functions  A(u)  and  R(t)  where  lim(_o+  m  =  o. 
Let  {Tn}  be  a  sequence  of  thresholds  on  (Q(0),Q(1))  such  that  nt„  — ►  oo  and 
— »  0  as  n  — *  oo,  where  tn  =  1  -  F[Tn)-  Further,  let  a  =  a(f*)  be  the  scalar 
value  of  a  function  a()  satisfying  a(t^)hQ{l  —  t\)  =  1  +  0(i2i(f*))  for  some 
positive  measurable  function  R\{t)  with  limt_4Q+  (t)  =  0.  Let  (pn,dn)  denote 
the  parameter  estimates.  Suppose  that  conditional  on  Tn, 


m  . 

L(tu) 


is  AN 


V21  V22 
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as  nt„  — ►  oo  for  scalars  v%]  such  that  the  covariance  matrix  is  positive  definite. 

(a)  For  1  -  t*  <  u  <  1, 

g*(u)  is  AN^g(u)  +  «g(C«).  (n^)-1  •  a£(u)^ 

where  |«g(*n,u)|  ^  A**(u)i2*(t£)  f°r  some  positive  measurable  function 
A**(u),  and  Oq(u)  is  given  by  (2.3.8). 

(b)  For  T„  <  x  <  Q(  1), 

F‘{z)  »  AN  (f(*)  +  4(C.  x),  (<;/»)  •  4w) 

where  |cj«(fn,x)|  <  A**(x)  for  some  positive  measurable  function 

A**(u),  and  Op(x)  is  given  by  (2.3.9). 

( e )  For  T„  <  x  <  Q(l), 

/"(x)  is  AN^/(x) +c)(t;,x),  (t;/n) -aj(x)^ 

where  |c^(t*,x)|  <  A**(x)  *t£/2*(t*)  for  some  positive  measurable  function 
A**(u),  and  oj(x)  is  given  by  (2.3.10). 


The  important  result  from  this  theorem  is  that  the  bias  in  the  tail  estimates 
is  of  the  same  order  as  the  bias  in  the  parameter  estimates.  The  bias  is  negligible 
for  t„  sufficiently  small,  but  the  idea  of  ‘sending  t\  to  zero’  is  unacceptable  in  ap¬ 
plications  since  this  effectively  removes  the  desire  to  estimate  the  tail.  Therefore, 
some  amount  of  bias  must  be  accepted  due  to  the  very  nature  of  the  problem. 
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7.  THRESHOLD  SELECTION 

In  order  to  reduce  the  bias  in  the  parameter  estimates  proposed  in  this  work, 
the  threshold  percentile  tn  should  be  chosen  so  that  limn_oo  *n  =  0.  However,  the 
asymptotic  results  for  these  estimators  require  that  limn-nx,  ntn  =  oo.  Therefore, 
a  trade  off  between  bias  and  variance  must  be  made  when  choosing  the  threshold 
value. 

Further,  the  concept  of  an  ‘optimal’  threshold  percentile  can  be  viewed  one 
of  two  wr  s.  If  the  tail  behavior  parameters  are  the  focus  of  the  analysis,  then 
a  threshold  should  be  chosen  which  minimizes  some  criterion  function  for  the 
parameter  estimate  (such  as  MSE).  However,  if  the  tail  estimates  are  the  focus 
of  the  analysis,  then  the  threshold  should  be  chosen  which  best  fits  the  observed 
tail. 

7.1  Optimal  Thresholds  Based  On  The  Parameters 

Csorg6,  Horvlth,  and  Rivesz  (1987)  investigate  the  existence  of  optimal  se¬ 
quences  for  the  threshold  for  any  estimate  of  the  tail  exponent.  They  conclude 
that  since  a  sequence  for  the  threshold  based  on  some  optimality  criterion  de¬ 
pends  on  the  unknown  tail  exponent  and  slowly  varying  function,  it  is  useless  in 
practice. 

In  an  attempt  to  avoid  this  dependency  on  unknown  quantities,  Hall  and 
Welsh  (1985)  considered  Hill’s  estimator  and  proposed  a  parametric  model  for 
the  slowly  varying  function.  They  propose  an  optimal  threshold  sequence  t0pt  = 
C0(7,  A)  •  n~  1/2*r,  where  Co(7,A)  is  selected  according  to  some  criterion  such 
as  MSE,  under  the  assumption  L{u)  =  (l/pA)[l  -  ©u 7  +  o(u'y)],  for  7  >  0  and 
A  >  0.  Hall  and  Welsh  then  propose  estimates  for  7  and  A  which  are  used  to 
estimate  the  optimal  threshold  sequence.  This  procedure  is  highly  dependent 
on  the  quality  of  the  parameterization  for  L(u),  and  the  associated  parameter 
estimates  performed  quite  poorly  in  their  simulation  study. 
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7.3  Optimal  Thresholds  Based  On  The  Tail 

When  the  focus  is  on  the  tails,  the  ‘best’  threshold  is  defined  as  the  one  that 
provides  a  tail  estimate  ‘closest’  to  the  observed  tail.  This  leads  to  a  data  analytic 
threshold  selection  procedure  where  the  observed  tail  dictates  the  threshold  value. 

The  following  algorithm  is  proposed  for  choosing  a  threshold  percentile  tn 
which  best  approximates  the  sample. 

•  For  t  =  l/n,2/n, . . . ,  (n  —  l)/n: 

1.  Compute  F$*(u)  based  on  the  exceedences  Xj  —  Q~(l  —  t)  given  Xj  > 

Q'(  1  -  <)• 

2.  Compute 

dt  =  sup  | F~(x)  -  Ft*(x)|. 

Q'(l-t)<x«?-(l) 

•  Choose  t0pt  =  where  t  is  the  smallest  solution  to  d  =  mint  dt- 

This  algorithm  chooses  the  threshold  percentile  which  minimizes  the  dis¬ 
tance  between  the  estimated  distribution  function  and  the  sample  distribution 
function  over  the  tail  values.  It  is  an  attempt  to  satisfy  the  intuitive  behavior 
of  the  threshold  percentile,  where  tn  is  small  enough  to  make  the  bias  small  and 
ntn  is  large  enough  so  that  the  tail  estimates  are  based  on  a  large  number  of  ob¬ 
servations.  The  properties  of  this  threshold  selection  procedure  warrant  further 
study. 
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8.  MOTIVATING  EXAMPLE 

The  importance  of  estimating  the  frequency  and  magnitude  of  flood  dis¬ 
charges  in  rivers  is  obvious.  In  most  instances,  the  ‘average’  flood  discharge  is 
of  little  interest.  Instead,  the  focus  of  inference  is  on  floods  with  small  probabil¬ 
ities  of  occurrence.  This  naturally  then  leads  to  investigation  of  the  tail  of  the 
underlying  distribution. 

Frequently  estimated  quantities  in  hydrology  are  the  ‘hundred  year  flood’  or 
‘thousand  year  flood,’  i.e.  the  flood  discharge  such  that  a  flood  of  this  magnitude 
(or  greater)  will  occur  once  in  100  years  or  1000  years  (on  average).  These  are 
convenient  synonyms  for  Q(.99)  and  Q(.999),  r< :  t  tively.  Therefore,  estimating 
the  tail  of  the  underlying  quantile  function  is  the  main  objective  of  the  analysis. 

Two  data  sets  from  hydrology  have  been  obtained  from  the  flood  frequency 
analyses  discussed  in  Pericchi  and  Rodriguez- Iturbe  (1985).  They  consider  two 
rivers  in  the  United  States  that  are  of  importance  for  the  regions  in  which  they 
are  located  and  whose  floods  have  been  the  object  of  previous  calculations  for 
engineering  works.  The  first  data  set  is  taken  from  Benjamin  and  Cornell  (1970) 
and  contains  59  years  of  annual  floods  (1902-1960)  for  the  Feather  River  at 
Oroville,  California.  The  Feather  River  annual  floods  are  given  in  Table  6  along 
with  descriptive  statistics  for  location,  scale,  and  tail  behavior. 

The  second  data  set  is  taken  from  Wood,  Rodrlguez-Iturbe,  and  Schaake 
(1974)  and  contains  37  years  of  annual  floods  (1929-1965)  for  the  Blackstone 
River  at  Woonsocket,  Rhode  Island.  The  Blackstone  River  annual  floods  are 
given  in  Table  7  along  with  descriptive  statistics  for  location,  scale,  and  tail 
behavior. 

The  IQ  box  plots  for  the  Feather  River  data  and  the  Blackstone  River  data 
are  given  in  Figures  6a  and  6b,  respectively.  Both  distributions  are  skewed  and 
heavy  tailed,  as  is  expected  in  flood  data.  Large  values  in  the  annual  flood 
sequence  are  often  interpreted  as  outliers  since  they  are  difficult  to  model,  yet 
are  expected  due  to  nature  and  should  be  deleted  only  in  special  circumstances. 

The  most  important  but  also  the  most  difficult  problem  in  analysis  is  the 
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Table  6 

Annual  floods  (1902-1960)  for  the  Feather  River  at  Oroville,  California  taken 
from  Benjamin  and  Cornell  (1970).  Descriptive  statistics  for  location,  scale,  and 
right  tail  behavior  are  also  given.  Notice  that  the  underlying  probability  model  is 
skewed  and  heavy  tailed. 


Year 

Flood  Discharge  (ft3 /sec) 

Year  Flood  Discharge  (ft3 /sec) 

1902 

42,000 

1932 

22,600 

1903 

102,000 

1933 

8,860 

1904 

118,000 

1934 

20,300 

1905 

81,000 

1935 

58,600 

1906 

128,000 

1936 

85,400 

1907 

230,000 

1937 

19,200 

1908 

16,300 

1938 

185,000 

1909 

140,000 

1939 

8,080 

1910 

31,000 

1940 

152,000 

1911 

75,400 

1941 

84,200 

1912 

16,400 

1942 

110,000 

1913 

16,800 

1943 

108,000 

1914 

122,000 

1944 

24,900 

1915 

81,400 

1945 

60,100 

1916 

42,400 

1946 

54,400 

1917 

80,400 

1947 

45,600 

1918 

28,200 

1948 

36,700 

1919 

65,900 

1949 

16,800 

1920 

23,400 

1950 

46,400 

1921 

62,300 

1951 

92,100 

1922 

36,400 

1952 

59,200 

1923 

22,400 

1953 

113,000 

1924 

42,400 

1954 

54,800 

1925 

64,300 

1955 

13,000 

1926 

55,700 

1956 

203,000 

1927 

94,000 

1957 

83,100 

1928 

185,000 

1958 

102,000 

1929 

14,000 

1959 

34,500 

1930 

80,100 

1960 

135,000 

1931 

11,600 

Measures  of  Center  Measures  of  Dispersion 

Median:  58,600.0  Interquartile  Range  70,600.0 

Mean:  70,265.1  Standard  Deviation  52,023.5 

Measures  of  Right  Tail  Behavior 

Qr(.90)  =  .5411  <?r(.95)  =  .8952  QT(. 99)  =  1.0227 
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Table  7 

Annual  floods  (1929-1965)  for  the  Blaekstone  River  at  Woonsocket,  Rhode  Island 
taken  from  Wood,  Rodriguez- Iturbe,  and  Schaake  (1974).  Descriptive  statistics 
for  location,  scale,  and  right  tail  behavior  are  also  given.  Notice  that  the  under¬ 
lying  probability  model  is  skewed  and  heavy  tailed. 


Year  Flood  Discharge  (ft3 /sec) 

Year  Flood  Discharge  (ft3 /sec) 

1929 

4,570 

1948 

5,810 

1930 

1,970 

1949 

2,030 

1931 

8,220 

1950 

3,620 

1932 

4,530 

1951 

4,920 

1933 

5,780 

1952 

4,090 

1934 

6,560 

1953 

5,570 

1935 

7,500 

1954 

9,400 

1936 

15,000 

6,340 

1955 

32,900 

1937 

1956 

8,710 

1938 

15,100 

1957 

3,850 

1939 

3,840 

1958 

4,970 

1940 

5,860 

1959 

5,398 

1941 

4,480 

1960 

4,780 

1942 

5,330 

5,310 

1961 

4,020 

1943 

1962 

5,790 

1944 

3,830 

1963 

4,510 

1945 

3,410 

1964 

5,520 

1946 

3,830 

1965 

5,300 

1947 

3,150 

Measures  of  Center 

Measures  of  Dispersion 

Median: 

4,970.0 

Interquartile  Range 

1,960.0 

Mean: 

6,372.9 

Standard  Deviation 

5,276.7 

Measures  of  Right  Tail  Behavior 

er(.90)  = 

.9541  Q/"(.95) 

=  2.5587  QT{.  99) 

=  2.5842 
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FIG.  6.  Identification  quantile  box  plots  for  the  data  used  as  examples.  The 
identification  quantile  box  plot  is  a  truncated  plot  of  the  sample  identification 
quantile  function  with  an  overlaid  reference  line  to  help  in  assessing  the  shape  of 
the  underlying  distribution.  Figure  6a  describes  the  Feather  River  annual  floods 
for  the  years  1902-1960  measured  at  Oroville,  California.  Figure  6b  describes  the 
Blackstone  River  annual  floods  for  the  years  1929-1965  measured  at  Woonsocket, 
Rhode  Island.  For  each  data  set,  it  appears  the  underlying  distribution  is  skewed 
and  heavy  tailed  as  is  expected  in  flood  data.  The  largest  observations  of  the 
Blackstone  River  data  can  be  interpreted  as  outliers,  but  large  observations  in 
flood  data  are  certainly  expected  due  to  nature  so  deleting  data  points  is  acceptable 
only  under  special  circumstances. 
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selection  of  an  appropriate  probability  model  for  the  annual  floods.  The  implicit 
assumption  of  the  analysis  is  that  the  probability  model  fit  to  the  observed  annual 
floods  is  valid  beyond  the  observed  range  of  values.  Three  common  parametric 
probability  models  used  in  flood  frequency  analysis  are  the  Gumbel  (also  referred 
to  as  the  Type  I  Extreme  Value)  distribution,  the  Lognormal  distribution,  and  the 
Pearson  Type  III  (also  referred  to  as  the  three  parameter  gamma)  distribution. 

The  Gumbel  can  be  theoretically  motivated  since  the  annual  flood  can  be 
considered  as  the  maximum  of  many  independent  and  identically  distributed 
random  variables  from  any  population  classified  as  medium  tailed  ( e.g .  normal, 
exponential,  Weibull,  loeiorm;  ogistic).  The  Lognormal  and  Pearson  type  III 
are  convenient  in  that  they  have  a  small  number  of  parameters  and  are  flexible 
in  fitting  the  data. 

Each  of  these  probability  models  has  been  fit  to  the  Feather  River  data  and 
the  Blackstone  River,  and  interposed  over  the  sample  quantile  function  Q~(u) 
in  Figures  7a  and  7b,  respectively.  Notice  that  all  three  models  appear  reason¬ 
able  for  describing  the  annual  floods.  In  fact,  using  the  Kolmogorov-Smirnov 
goodness  of  fit  test,  all  fail  to  reject  at  a  =  .05. 

Since  the  tail  of  the  quantile  function  is  the  focus  of  the  analysis,  consider 
Figures  8a  and  8b  which  graph  these  same  functions  on  the  upper  quartile.  For 
the  upper  tail,  these  ‘acceptable’  models  provide  very  different  estimates.  This  is 
not  surprising  since  the  estimated  probability  model  was  fit  using  the  entire  sam¬ 
ple  and  the  model  will  best  fit  the  center  where  the  majority  of  the  observations 
lie. 

In  order  to  remove  the  influence  of  observations  at  the  center,  estimators 
based  on  the  exceedences  of  a  threshold  are  appropriate.  Further,  a  generally 
applicable  estimator,  free  of  the  restrictions  imposed  by  a  parametric  family,  will 
allow  the  data  to  dictate  the  tail  estimate.  Therefore,  the  tail  estimates  proposed 
in  this  work  will  provide  useful  tools  for  improving  the  analysis  of  annual  flood 
data. 

Before  computing  the  estimates  proposed  in  this  work  for  the  Feather  River 
data  and  the  Blackstone  River  data,  the  observations  have  been  centered  and 
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fritter  Him  tanuil  Flaoii  (lltt-M) 


(A) 


llicbtoM  Xivtr  Annual  rioo4i  (1929-65) 


(B) 


FIG.  7.  Graph  of  the  estimated  Gumbcl  model  (solid  line),  Lognormal  model 
(dotted  line),  and  Pearson  Type  III  model  (solid  line  with  blocks)  overlaid  on  the 
sample  quantile  function  (step  function)  for  the  data  used  as  examples.  Figure 
7a  is  for  the  Feather  River  data  and  Figure  7b  is  for  the  Blackstone  River  data. 
All  three  models  appear  reasonable  for  describing  the  annual  floods,  and  all  fail 
to  reject  the  Kolmogorov-Smirnov  goodness  of  fit  test  at  a  =  .05. 
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hatter  Rivtr  Annual  FltoAs  (1982-60) 


(A) 


Blacks tont  Rivtr  Annual  Flo# As  (1929-63) 


(B) 


FIG.  8.  Graph  of  the  estimated  Gumbel  model  (solid  line),  Lognormal  model 
(dotted  line),  and  Pearson  Type  III  model  (solid  line  with  blocks)  overlaid  on 
the  sample  quantile  function  (step  function)  on  the  upper  quartile  for  the  data 
used  as  examples.  Figure  8a  is  for  the  Feather  River  data  and  Figure  8b  is 
for  the  Blackstone  River  data.  Notice  that  the  three  probability  models  provide 
very  different  tail  estimates.  This  is  due  to  the  use  of  the  entire  sample  in  the 
estimation,  forcing  the  model  to  fit  better  at  the  center  of  the  distribution  where 
the  majority  of  the  observations  lie. 


scaled  by  subtracting  the  median  and  dividing  by  twice  the  interquartile  range. 
The  parameter  estimates  are  not  location-scale  invariant,  and  a  different  stan¬ 
dardization  will  result  in  different  parameter  estimates.  However,  this  is  not  a 
serious  issue  since  the  tail  estimates  are  the  focus  of  the  analysis  and  can  be 
obtained  in  the  original  units  by  correcting  for  the  standardization. 

8.1  Tail  Estimates  For  The  Feather  River 

The  parameter  estimates  for  the  GPD  and  GEV  modeling  of  the  exceedences 
are  given  in  Table  8.  Since  it  appears  reasonable  from  the  IQ  box  plot  of  the 
Feather  River  Data  to  classify  the  underlying  probability  model  as  short  tailed, 
Hall’s  parameter  estimates  are  also  appropriate  and  tabled.  A  further  implication 
of  this  classification  is  the  existence  of  a  finite  upper  bound  Q(l).  This  does 
not  imply  that  a  ‘maximum  flood  level’  exists,  but  that  the  best  approximating 
probability  model  has  such  an  upper  bound.  A  graphical  comparison  of  the 
different  parameter  estimators  is  given  in  Figure  9. 

8.2  Tail  Estimates  For  The  Blackstone  River 

The  parameter  estimates  for  the  GPD  and  GEV  modeling  of  the  exceedences 
are  give!)  in  Table  9.  Since  it  appears  reasonable  from  the  IQ  box  plot  of  the 
Blackstone  River  Data  to  classify  the  underlying  probability  model  as  long  tailed, 
Hill’s  parameter  estimates  are  also  appropriate  and  tabled.  It  is  interesting  to 
note  that  the  threshold  selection  algorithm  chose  the  same  threshold  percentile 
for  each  of  the  proposed  tail  estimators.  However,  even  with  the  same  threshold 
percentile  the  parameter  estimates  are  different.  A  graphical  comparison  of  the 
different  parameter  estimators  is  given  in  Figure  10. 
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Table  8 

Table  containing  the  optimal  threshold  percentile  and  parameter  estimates  Jor  the 
proposed  tail  estimates  based  on  the  exceedences  of  a  threshold  for  the  Feather 
River  annual  floods.  Parameters  are  estimated  using  the  GPD  modeling,  the 
GEV  modeling,  and  Hall’s  estimate  for  an  underlying  short  tailed  distribution. 
The  classification  of  the  underlying  distribution  as  short  tailed  follows  from  the 
IQ  box  plot.  Notice  that  since  a  short  tailed  probability  model  appears  reasonable, 
a  finite  upper  bound  Q(l)  is  estimated  by  the  GPD  estimate  and  Hall’s  estimate. 
The  GEV  estimate  has  C?‘(l)  =  oo  since  p  >  0. 


Threshold  Percentiie  t 

A 

P 

a 

<r(  i) 

GPD 

20/59  as  .339 

-.259 

.426 

290,571  ft3 /sec 

GEV 

23/59  w  .390 

.021 

.254 

Hall’s  Est. 

24/59  as  .407 

-.106 

.341 

528,615  ft3 /sec 

Ftathir  livtr  Annul  Flu*  (Htt-41) 


FIG.  9.  Graph  of  the  proposed  tail  estimates  based  on  the  exceedences  of  a 
threshold  for  the  Feather  River  annual  floods  on  the  upper  quartile.  The  tail 
estimate  based  on  the  GPD  modeling  ( solid  line),  GEV  modeling  (dotted  line), 
and  using  Hall’s  estimate  for  an  underlying  short  tailed  probability  model  (solid 
line  with  blocks)  are  overlaid  on  the  sample  quantile  function  (step  function)  for 
comparison. 
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Table  9 

Table  containing  the  optimal  threshold  percentile  and  parameter  estimates  for  the 
proposed  tail  estimates  based  on  the  exceedences  of  a  threshold  for  the  Blackstone 
River  annual  floods.  Parameters  are  estimated  using  the  GPD  modeling,  the 
GEV  modeling,  and  Hill’s  estimate  for  an  underlying  long  tailed  distribution. 
The  classification  of  the  underlying  distribution  as  long  tailed  follows  from  the 
IQ  box  plot. 


Threshold  Percentile  t 

P 

a 

GPD 

18/37  «  .486 

1.100 

.218 

GEV 

18/37  w  .486 

1.076 

.131 

Hill’s  Est. 

18/37  »  .486 

1.602 

.135 

Ilackstm  Him  fcvaiul  Floris  (l)29-t3> 


FIG.  10.  Graph  of  the  proposed  tail  estimates  based  on  the  exceedences  of  a 
threshold  for  the  Blackstone  River  annual  floods  on  the  upper  quartile.  The  tail 
estimate  based  on  the  GPD  modeling  (solid  line),  GEV  modeling  ( dotted  line), 
and  using  Hill’s  estimate  for  an  underlying  long  iailed  probability  model  (solid 
line  with  blocks)  are  overlaid  on  the  sample  quantile  function  ( step  fvnction)  for 
comparison. 
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9.  CONCLUDING  REMARKS 

Tail  estimates  of  the  underlying  probability  model  are  of  interest  in  many 
applications.  In  addition,  the  tail  behavior  of  a  probability  model  dictates 
many  theoretical  properties  with  important  implications  for  probability  model¬ 
ing.  Therefore,  generally  applicable  tail  estimates  can  serve  as  valuable  diagnostic 
tools  in  fitting  probability  models  to  data. 

In  this  work,  tail  estimates  have  been  proposed  which: 

•  use  only  the  observations  in  the  tail,  and 

•  are  generally  applicable,  making  minimal  assumptions  on  the  underlying 
probability  model. 

Two  distinct  approaches  in  this  format  are  unified  by  modeling  the  density- 
quantile  function  as  a  regularly  varying  function  and  representing  the  quantile 
function  for  the  conditional  distribution  of  the  exceedences  of  a  threshold  as 
the  sum  of  a  parametric  function  and  an  analytic  error  function.  The  quantile 
representation  for  the  exceedences  is  the  key  to 

(1)  forming  a  parametric  model  for  the  tail  of  the  underlying  probability  model; 

(2)  motivating  methods  for  obtaining  parameter  estimates;  and 

(3)  deriving  the  asymptotic  properties  of  the  proposed  parameter  estimates. 
Parameter  estimates  may  be  obtained  using  a  Generalized  Pareto  Distribu¬ 
tion  (GPD)  or  a  Generalized  Extreme  Value  Distribution  (GEV)  modeling  of  the 
exceedences.  Assuming  the  underlying  distribution  can  be  correctly  classified  as 
either  short  tailed  or  long  tailed,  other  estimates  can  be  formed. 

The  unified  approach  allows  for  comparison  of  these  different  estimators. 
All  are  shown  to  be  biased,  and  no  global  statements  can  be  made  regarding  an 
‘optimal’  estimator. 

Much,  of  the  previous  work  on  estimating  tail  behavior  has  focused  solely 
on  the  problem  of  parameter  estimation.  However,  the  parameters  are  shown  to 
be  nonidentifiable  and  their  estimators  will  always  contain  a  bias  which  may  be 
non-negligible.  In  order  to  estimate  the  parameters  with  a  reasonable  amount  of 
precision,  extremely  large  sample  sizes  are  required. 
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In  this  work,  the  focus  has  been  on  obtaining  tail  estimates.  In  this  scenario, 
the  bias  in  the  parameter  estimates  causes  a  bias  in  the  tail  estimate  of  the  same 
order.  Bias  reduction  is  made  at  the  cost  of  inflated  variance.  Therefore,  some 
compromise  must  be  made. 

To  demonstrate  the  tail  estimators  proposed  in  this  work,  two  sequences  of 
annual  floods  were  selected.  The  problem  of  probability  modeling  is  an  important 
issue  in  hydrology  since  the  estimate  of  the  tail  is  highly  dependent  on  the  model 
and  there  is  little  empirical  evidence  one  can  produce  to  support  a  given  model. 
Thus,  a  generally  applicable  approach  to  tail  estimation  where  the  data  dictates 
the  form  of  the  model  is  a  useful  diagnostic  tool  for  further  probability  modeling. 
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APPENDIX  A 

COMMON  PARAMETRIC  PROBABILITY  MODELS 

This  appendix  contains  the  identification  standardized  versions  of  many  com¬ 
mon  parametric  random  variables.  The  identification  transformation  is  useful  in 
comparing  different  types  of  tail  behavior  since  the  corresponding  quantile  func¬ 
tion  equals  zero  and  has  slope  approximately  one  at  u  =  .5. 

The  identification  transformation  of  a  random  variable  Y  with  distribution 
function  F(y),  density  function  /(y),  and  quantile  function  Q(u)  is  simply  a 
location  -  scale  standardization.  Let  n  =  Q(.5),  the  median,  and  a  —  2[Q(.75)  - 
<J(.25)|,  the  quartile  deviation.  Then  make  the  transformation  ZQI  =  (K  —  jx)/o, 
which  results  in  the  identification  distribution  function  FI(z)  =  F[fi  +  az), 
identification  density  function  fl(z)  =  a  f{n  +  az),  and  identification  quantile 
function  QI(u)  =  (Q(u)  -  /i)/o. 

This  appendix  defines  the  distribution  function,  density  function,  quantile 
function,  and  density-quantile  function  for  the  identification  standardized  ver¬ 
sions  of  some  common  parametric  probability  models.  Graphs  of  these  functions 
are  also  given. 
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•  The  Uniform  Distribution  is  given  by 

{0,  i  <  -.5 
*  +  •5,  —.5  <  i  <  .5 
1,  *  >  -5, 

with  quantile  function 

Q(u)  =  u  -  .5, 

density  function 

f  1,  —.5  <  z  <  .5 

/(*)  = 

(  0,  otherwise, 

and  density-quantile  function 

/Q(u)  =  1. 

These  functions  are  plotted  in  Figure  11. 


SistriktiM  functiM,  F<x)  .  iuutilt  Function,  9(u) 


FIG.  11.  Uniform  Distribution. 


•  The  Negative  Exponential  Distribution  is  given  by 


F(x) 

with  quantile  function 

density  function 


l<;(21n3)x)  x<  In  2/(2  In  3) 
1,  x  >  In  2/(2  In  3), 


<2(u) 


ln2u 
2  In  3’ 


/(*) 


ln3  <,(2bi3)*)  x  <  In  2/(2  In  3) 
0,  x  >  In  2/(2  In  3), 


and  density-quantile  function 


fQ(u)  =  (2  In  3)u. 


These  functions  are  plotted  in  Figure  12. 


Density  Function,  f(x) 


Density-Quantile  Function,  fQ(u) 


-2  -1.7  -1.2  -.7  -.2  .3 1 


1  .2  .3  .4  .3  .(  .7  .1  .9  1 


FIG.  12.  Negative  Exponential  Distribution. 


•  The  Negative  Weibull(p)  Distribution,  p  >  0,  (which  is  also  referred  to  as 
the  Type  III  Extreme  Value  Distribution)  is  given  by 
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F(x) 


exp{-((ln2)1/p  -ox]p),  x  <  (ln2)1/p/cr 
1,  x  >  (ln2)1/p/a, 


with  quantile  function 


density  function 


/(*) 


ap((ln2)1/P  -  ax)p~l 

|  exp{  — [(ln2)1/p  -  <rx]p},  x<(ln2)1/p/o 

(o,  x  >  (ln2)1/p/<j, 


and  density-quantile  function 


fQ(u}  =  apu(—  In  u)  (1/p)+1, 

where  a  =  2[(ln4)1/P-  (ln4-ln3)1/p.  These  functions  are  plotted  in  Figure 
13. 


•  The  Exponential  Distribution  is  given  by 

0,  x  <  -In 2/(2 In 3) 


F(x)  = 


with  quantile  function 


l-£e(21n3)z,  x  > -In 2/(2 In 3), 


density  function 


(  0,  x  <  -  In 

/(l'  =  1  (In3)e-(2>»3K  x  >  -  In 


<  -  In  2/(2  In 3) 
2/(2  In  3), 


and  density-quantile  function 


/<?(«)  =  (2  In  3)(l-ti). 
These  functions  are  plotted  in  Figure  14. 


FIG.  14.  Exponential  Distribution. 
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The  Logistic  Distribution  is  given  by 


F(x) 


1  + 


1 _ 

g— (4  In  3)x 


with  quantile  function 


<?(«) 


Inti  -  ln(l  -  ti) 
4  In  3 


density  function 

(4ln3)«-(4ln3)1 
f[X)  "  (i+e-(4l»3)*)2’ 

and  density-quantile  function 

fQ(u)  —  (41n3)u(l  -  tt). 
These  functions  are  plotted  in  Figure  15. 
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•  The  Normal  Distribution  is  given  by 


with  quantile  function 

density  function 


F(x)  =  $(oz), 

Q(u)  =  S-1(u)/o, 
/(z)  =  a<t>(ax), 


and  density-quantile  function 

=  °<t>  [$-1(u)]  , 


/<?(u)  =  7SMp 


where  $(z)  =  <l>(t)dt,  <l>(x)  =  ( l/y/2n)e~x */2  and  a  =  2[$-1(.75) 

$~*(.25)]  «  2.6979.  These  functions  are  plotted  in  Figure  16. 


FIG.  16.  Normal  Distribution. 
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•  The  Weibull(p)  Distribution,  p  >  0,  is  given  by 

f  0,  z  <  — (ln2 )l/p/o 

F(x)  =  l  ' 

1 1  —  exp{— ((ln2)1/p  +  oz]p},  x>  -(ln2)1/p/o, 
with  quantile  function 

Q(u)  ^ 


density  function 


/(*) 


0,  z  < -(ln2)1/p/a 

-  op[(ln2)1/p  +  <7zjP  1 

k  exP{-[(ln2)1/p  +  azlp},  z  > -(In 2)1/p/a, 


and  density-quantile  function 


fQ{u)  =  ap(  1  -  u)[-  ln(l  -  u)]  (1/p)+1, 

where  a  =  2f(ln 4)  1/p  —  (ln4  —  ln3)1/p.  These  functions  are  plotted  in  Figure 
17. 


FIG.  17.  Weibull(p)  Distribution. 
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•  The  Lognormal  Distribution  is  given  by 

F(x )  =  $(ln(ox  +  1)], 

with  quantile  function 


density  function 

f(x)  =  — +  1)]> 

and  density-quantile  function 

fQ{u)  =  <7^[$-1(u)]  ‘M, 

where  a  =  2[e*  M-75)  —  1(-25)]  ss  2.9072.  These  functions  are  plotted  in 

Figure  18. 


FIG.  18.  Lognormal  Distribution. 
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FIG.  19.  Cauchy  Distribution. 


95 


•  The  Pareto(p)  Distribution,  p  >  0,  is  given  by 

_  |  0,  z  <  (1  -  21/P)/o 


F(x)  = 

with  quantile  function 


(21/p  +  <tz]-p,  x  >  (1  -  21/p)/<r, 


ow  = 

o 


density  function 

0,  x  <  (1  -  21/P)/a 

ap[21/p  +  ax}'  _1),  x  >  (l  -  21/p)/<x, 
and  density-quantile  function 


fU 


fQ{u)  =  ap{  1  -  u)(1/p)+1, 

where  a  =  2  •  41/p(l  —  3-1/p).  These  functions  are  plotted  in  Figure  20. 


FIG.  20.  Partto(p)  Distribution. 


•  The  Fr6chet(p)  Distribution,  p  >  0,  (which  is  also  referred  to  as  the  Type  II 
Extreme  Value  Distribution)  is  given  by 


96 


F  M  = 

with  quantile  function 


0,  x  <  —  (ln2)  1/p/o 

exp  j-[(ln2)-1/p  +  ffi]-p j  ,  z  >  —  (ln2)-1/p/o, 


QH  = 


(—  lntt)-1/p  —  (ln2)-l/p 


density  function 

0,  x<-(ln2  )"1/p/o 

<rp((ln2)-1/p  +  <7x]_p_1 

[  -exp  |-[(ln 2)_1/P  +  <tx]“p|  ,  x  > -(ln2 )~l/p/a, 
and  density-quantile  function 


/(*)  =  { 


fQ{u)  =apu(-lnu)(1/p)+1, 

where  a  =  2((ln4  -  ln3)~l/p  -  (ln4)-1/p).  These  functions  are  plotted  in 
Figure  21. 


FIG.  21.  Frichtt(p)  Distribution. 
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Proof  of  Theorem  2.3.1.  Consider 
AC(i  -  019(1  -  <«)  -  Q(i  -  01  =/<?(1(-‘)(9(i  - 1»)  -  9(1  -  01 

JSiLzJl 

*  Jl-t 

_  t'-,u  /9(i  -  0  1  , 

~  h-t  /«( 0  1 

_  ['  m  -  0 


.  r  mz±iz 

_  f1  ■ 

A  (iiip+iitio 

/«  £(<0 

=  [  z~p~ldz 

J  U 

y  u  lm**)  J 

=  -  ff(ttj-p)  +<(t,U,p). 

Since  the  quantile  function  of  the  exceedences  over  a  threshold  is 

<3  AT-r  |  *>r  (u;r)  =  <2(i  -  **(*■  -  u))  ~  QC1  ~  O 

where  t*  =  1  —  F(T),  the  theorem  follows.  □ 

PROOF  of  THEOREM  2.3.2(a).  If  L(u)  is  a  slowly  varying  function  as 
u  — ♦  0+,  then  by  Potter’s  Theorem,  for  any  constants  A  >  1,  a  >  0,  there  exists 
T  =  T  i  ,  a)  such  that 

jy-T  <  A  •  max{z_Ql,  za}  for  0  <  t  <  T,  0  <  z  <  1. 
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Hence  given  0  <  6  <  1, 

,-hM 


L(tz) 

which  is  an  integrable  function  on  (£,  1) 
Consider,  for  p  6  1R, 


«(*,«,  P)  <  sup  f(*,u,p) 
£<u<l 


<  Az  p  ^maxf*  a,za}, 


=  sup 
5<u<l 


m 


—  1  \  dz 


L(U) 

=  f1  z-*'1  ~\dz  -  [l  z-f-'dz 

JS  L(t*)  Js 

J  z~p~ldz  —  J  z~(,~ldz  as  t 


0+ 


=0. 


Thus,  c(t,u,p)  — »  0  uniformly  in  6  <  u  <  1.  □ 


PROOF  OF  Theorem  2.3.2(b).  Under  the  assumption 


m 

L(tu) 


<  A(u)R(t) 


for  some  positive  measurable  functions  j4(u)  and  R(t)  where  limt_^+  R(t)  =  0, 
then 


e(«,  u,  p)j  <  / 
J  U 


r-p-l 


L(t) 


-  1 


L(tz) 

<  /  z~p~l  •  A(z)R(t)dz 
J  u 

=R{t)  •  J  z~p~l A(z)dz 
=A'{u)R{t).  □ 


dz 
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Proof  of  Theorem  2.4.1.  Since  fQ(  1  -  u)  =  up+1L(u),  where  p  e 
1R,  p  #  0  and  L(u)  is  slowly  varying  as  u  — ►  0+,  the  conditional  distribution  of 
the  exceedences  can  be  written 

Q  x-t  |  x>t  +  c(**,l  -u,p)]. 

where  t*  =  1  —  F(T). 

Let  a  =  a(t*),  the  scalar  value  of  the  function  a(-).  Then 

E  X-T  1  X>T  [1  +  \{X  -  T)]“  [in  {l  +  fa  -!•)}]' 

~io  t1  JC-r  |  jc>r  (“)]  [ln  (1  +  x-t  |  x>t  (“) }]  ^ <iu 

=  Ft  l-u)~pa - - - 

Jo  [  J  [a(t*)/iQ(l  -  f)  ]“ 

•  [l  +  (1  -  «)P{P€(**,  1  -  «,  p)  +  \a(t')hQ(  1  -  O  -  1)}]‘ “ 

•  |ln(l  -  u)~P  -  lna(t*)h<5(l  -  t *) 

+  In  1  -I-  (1  -  u)p{/>c(f*,  1  -  u,p)  4-  [a(t*)/iQ(l  -  t*)  -  l]}  >  du 

=  j\i  -  urfa  [hi  -  u)-^ 

1 

'  [a(C)A«(l  -  C)l° 

•  [l  +  (1  -  «)'{(>«((*,  1  -  u,(>)  +  [o(i‘)AQ(l  -  t-)  -  1]} 

.A+  1 

V  ln(l  -u)-f 

ri  +  (1  -  «)'{*(«•,  1  -  u,p)  +  w<><?(  1  -  (•)  -  lim* 

,n[ - \)  iu- 


m  , 

L(tu) 


<  A{u)R{t), 


If 
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then  |e(t,u,p)j  <  A*(u)J2(t).  Also,  it  is  assumed  that  o(t*)hQ(l  -  t*)  =  1  + 
0(J2i(t*)).  Hence, 

I  J\l  [ln(l  -«)-<’]#  du 

-  E  X-T  |  X>T  [l  +  P-(X  -  T)]“  [ln{l  +  *(X  -r)}]/*| 

=  1  j  (l  -  u)-***  (ln(l  -  u)~p]^  du 

-  f1  (1  -  u)~pa  [ln(l  -  u)-p]^ 

Jo 

1 

'  [<.(<><5(1  - 1-)|« 

•  [l  +  (1  -  «)'{<*((*,  1  -«,*>)  +  (a(t*)A<?(l  -  (’)  -  1]}  “ 

V  +  ln(l  —  u)-' 

,  ri + a  -  «w,  i  -  «.o) + i«(<‘)A9(i  -  n  -  iim" i 

'ln[ - a«-)/,«(i  -*=j - \)  *| 

<  Ijf ‘(1  -  u)-“a  [tn(l  -  u)-pfdu 

-J\  1  -  «p»  [ln(l  - 

•(i+MSiinr 


•  (1  +  (1  -  « y\pA'(l  -  u )«((•)  +  Affl:1(t*))]“ 

\  +  ln(l  —  u)~p 


\ /> 

•  ln(l  +  (1  -  u)“\pA‘(l  -  u)R(l")  +  MR,(t’))]J  du 


(1  +  MMt-)) 

.  ■  +  Af,(l  -  u)p\pA'(l  -  u)R(t")  +  M«i(0)l 
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•(1  +  M3MT^pln(l  +  M/il('’)>) 

'(1+M4ln(l-u)-' 

•  ln(l  +  {1  -  uy[pA*(l  -  u)R(f)  +  du | 

<  MiRi{t*)  f\l-u)~pa[\n(l-u)-p]fidu 

Jo 

+  | M2  J^(l-  u)"^a_1)  [ln(l  -  it)-']*  [M*(l  -  n)R{t*)  +  MR^t^du 

+  |m5/Zi(<*)  t\  1  -  u)"^  [ln(l  -  du 

I  Jo 

+  JiVfe  J  (1  -  u)-p(a_1)  [ln(l  -  u)-p]^  1 

[pA*{  1  -  u)J2(0  +  M/2i(t*)]du 

<  A#7**(0. 


where  R*(t *)  =  max{iZ(t*),  Ri(t*)}  and  Afj  are  positive  constants. 

The  other  three  expectations  are  found  by  changing  the  function  a(  )  and 
following  the  same  arguments.  □ 


APPENDIX  D 

GRADIENT  AND  HESSIAN  OF  THE  GPD  LOG-LIKELIHOOD 


Consider  the  space  defined  by  A  =  {-1  <  p  <  0,  a  >  -p-Y ([ntn];  [ntn])}  U 
(p  >  0,  a  >  0}.  On  the  space  A ,  the  gradient  vector  of  the  GPD  log-likelihood 
has  elements 


dlGPD  (p,*;Y) 
dp 


[n*»l 


da  pa  a\p  )  \  a  ) 

On  the  space  A,  the  Hessian  matrix  of  the  GPD  log-likelihood  has  elements 


d^GPD  (?><»;*')  [»*»  1/3 


dp2 

P2  \/ 

2 

P2 

1 

+  ~2 
P2 

d2£GPD(p>a'>  Y) 

In*nJ 

da2 

pa2 

q2^g?  o(p»a;^) 

[n*nj 

dpda 

p2a 

[nt„ 


[n<n 


»=1 


-1 


-2 


+ 


+ 


pa 


e  ♦•)£(■*?) 
g  *■)£(■*?) 


-2 


-1 
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APPENDIX  E 

GRADIENT  AND  HESSIAN  OF  THE  GEV  LOG-LIKELIHOOD 

Consider  the  space  defined  by  A  =  {—1  <  p  <  0,  a  >  — p-  y([ntn];  [ntn])}  u 
{p  >  0,  a  >  0}.  On  the  space  A,  the  gradient  vector  of  the  GEV  log-likelihood 
has  elements 

ort  /„  \s\  1  /i  \  i 


dp 


♦*§ 


3£gev{p 


da 


-1 


t=l 

fnt, 


-1/p 


On  the  space  A,  the  Hessian  matrix  of  the  GEV  log-likelihood  has  elements 

["t„ 


d2£GE v(p»°;  Yl  [wfn]  (  3 


dp 2 


-MHSM) 


-1 


-2 


a 

-(l/p)-l 


% 


d2lGEv(/>»a;  Y) 

da 2 


d2  ^GEV  (P»  °iy) 
5  pda 
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APPENDIX  F 

DERIVATIVES  OF  HALL’S  ESTIMATING  EQUATIONS 


For  the  estimating  equations  used  for  Hall’s  estimates  for  short  tailed  distri¬ 
butions,  the  derivatives  are  given  by 

1  /  1  \  .  2  /  X-T„  \ 

- ai - -  ‘  M1  +  M  i  VH1  ~  ’ 

frMp,Q(i);X-rn)  _i,  m  _  ,_2 


{(■*»)- . i-^n. 


^i(p,Q(i);X-tw)  _av>2(p,q(i);X-rn) 

3Q(1)  dp 


rn  l 


(. 


X-T„ 


1-1 
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